class GithubStats::Data

Data class for calculations

Attributes

raw[R]
to_a[R]

Public Class Methods

new(data) click to toggle source

Create a data object and turn on caching

# File lib/githubstats/data.rb, line 35
def initialize(data)
  @raw = data.map { |d, s| Datapoint.new(Date.parse(d), s.to_i) }.sort_by(&:date)
  enable_caching %i[to_h today streaks longest_streak streak max mean
                    std_var quartile_boundaries quartiles start_date
                    end_date]
end

Public Instance Methods

[](date) click to toggle source

The score for a given day

# File lib/githubstats/data.rb, line 74
def [](date)
  to_h[Date.parse(date)]
end
end_date() click to toggle source

The end of the dataset

# File lib/githubstats/data.rb, line 60
def end_date
  @raw.last.date
end
gh_outliers() click to toggle source

Outliers as calculated by GitHub They only consider the first 3 or 1, based on the mean and max of the set

# File lib/githubstats/data.rb, line 148
def gh_outliers
  outliers.take(max.score - mean < 6 || max.score < 15 ? 1 : 3)
end
longest_streak() click to toggle source

The longest streak

# File lib/githubstats/data.rb, line 99
def longest_streak
  return [] if streaks.empty?
  streaks.max_by(&:length)
end
max() click to toggle source

The highest scoring day

# File lib/githubstats/data.rb, line 115
def max
  @raw.max_by(&:score)
end
mean() click to toggle source

The mean score

# File lib/githubstats/data.rb, line 122
def mean
  scores.sum / @raw.size.to_f
end
outliers() click to toggle source

Outliers of the set

# File lib/githubstats/data.rb, line 139
def outliers
  return [] if scores.uniq.size < 5
  scores.select { |x| ((mean - x) / std_var).abs > GITHUB_MAGIC }.uniq
end
pad(fill_value = -1, data = @raw.clone) click to toggle source

Pad the dataset to full week increments

# File lib/githubstats/data.rb, line 190
def pad(fill_value = -1, data = @raw.clone)
  data = _pad data, 0, fill_value, 0
  _pad data, -1, fill_value, 6
end
quartile(score) click to toggle source

Return the quartile of a given score

# File lib/githubstats/data.rb, line 182
def quartile(score)
  return nil if score.negative? || score > max.score
  quartile_boundaries.count { |bound| score > bound }
end
quartile_boundaries() click to toggle source

The boundaries of the quartiles The index represents the quartile number The value is the upper bound of the quartile (inclusive)

# File lib/githubstats/data.rb, line 157
def quartile_boundaries # rubocop:disable Metrics/AbcSize
  top = scores.reject { |x| gh_outliers.include? x }.max
  range = (1..top).to_a
  range = [0] * 3 if range.empty?
  mids = (1..3).map do |q|
    index = (q * range.size / 4) - 1
    range[index]
  end
  bounds = (mids + [max.score]).uniq.sort
  ([0] * (5 - bounds.size)) + bounds
end
quartiles() click to toggle source

Return the list split into quartiles

# File lib/githubstats/data.rb, line 172
def quartiles
  quartiles = Array.new(5) { [] }
  @raw.each_with_object(quartiles) do |elem, acc|
    acc[quartile(elem.score)] << elem
  end
end
scores() click to toggle source

Scores in chronological order

# File lib/githubstats/data.rb, line 81
def scores
  @raw.map(&:score)
end
start_date() click to toggle source

The start of the dataset

# File lib/githubstats/data.rb, line 54
def start_date
  @raw.first.date
end
std_var() click to toggle source

The standard variance (two pass)

# File lib/githubstats/data.rb, line 129
def std_var
  first_pass = @raw.reduce(0) do |acc, elem|
    ((elem.score.to_f - mean)**2) + acc
  end
  Math.sqrt(first_pass / (@raw.size - 1))
end
streak() click to toggle source

The current streak, or nil

# File lib/githubstats/data.rb, line 107
def streak
  return [] if streaks.empty?
  streaks.last.last.date >= Date.today - 1 ? streaks.last : []
end
streaks() click to toggle source

All streaks for a user

# File lib/githubstats/data.rb, line 88
def streaks
  streaks = @raw.each_with_object(Array.new(1, [])) do |point, acc|
    point.score.zero? ? acc << [] : acc.last << point
  end
  streaks.reject!(&:empty?)
  streaks
end
to_h() click to toggle source

The data as a hash where the keys are dates and values are scores

# File lib/githubstats/data.rb, line 45
def to_h
  @raw.reduce(Hash.new(0)) do |acc, elem|
    acc.merge(elem.date => elem.score)
  end
end
today() click to toggle source

The score for today

# File lib/githubstats/data.rb, line 67
def today
  to_h[Date.today]
end

Private Instance Methods

_pad(data, index, fill_value, goal) click to toggle source
# File lib/githubstats/data.rb, line 197
def _pad(data, index, fill_value, goal)
  mod = (index * -2) - 1 # 0 index moves -1 in time, -1 move +1 in time
  point = GithubStats::Datapoint
  data.insert index, point.new(data[index].date + mod, fill_value) until data[index].date.wday == goal
  data
end