class SupervisedLearning::LinearRegression

This class uses linear regression to make predictions based on a training set. For datasets of less than 1000 columns, use predict since this will give the most accurate prediction. For larger datasets where the predict method is too slow, use predict_advanced. The algorithms in predict and predict_advanced were provided by Andrew Ng (Stanford University). @author Michael Imstepf

Public Class Methods

new(training_set) click to toggle source

Initializes a LinearRegression object with a training set @param training_set [Matrix] training_set, each feature/dimension has one column and the last column is the output column (type of value predict will return) @raise [ArgumentError] if training_set is not a Matrix or does not have at least two columns and one row

# File lib/supervised_learning.rb, line 17
def initialize(training_set)      
  @training_set = training_set
  raise ArgumentError, 'input is not a Matrix' unless @training_set.is_a? Matrix
  raise ArgumentError, 'Matrix must have at least 2 columns and 1 row' unless @training_set.column_size > 1

  @number_of_features = @training_set.column_size - 1
  @number_of_training_examples = @training_set.row_size      

  @feature_set = @training_set.clone
  @feature_set.hpop # remove output set

  @output_set = @training_set.column_vectors.last      
end

Public Instance Methods

predict(prediction) click to toggle source

Makes prediction using normalization. This algorithm is the most accurate one but with large sets (more than 1000 columns) it might take too long to calculate. @param prediction [Matrix] prediction

# File lib/supervised_learning.rb, line 35
def predict(prediction)
  # add ones to feature set
  feature_set = Matrix.hconcat(Matrix.one(@number_of_training_examples, 1), @feature_set)

  validate_prediction_input(prediction)
        
  transposed_feature_set = feature_set.transpose # only transpose once for efficiency
  theta = (transposed_feature_set * feature_set).inverse * transposed_feature_set * @output_set

  # add column of ones to prediction
  prediction = Matrix.hconcat(Matrix.one(prediction.row_size, 1), prediction)
  
  result_vectorized = prediction * theta
  result = result_vectorized.to_a.first.to_f
end
predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false) click to toggle source

Makes prediction using gradient descent. This algorithm is requires less computing power than predict but is less accurate since it uses approximation. @param prediction [Matrix] prediction

# File lib/supervised_learning.rb, line 55
def predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false) 
  validate_prediction_input(prediction)

  feature_set = normalize_feature_set(@feature_set)
  # add ones to feature set after normalization
  feature_set = Matrix.hconcat(Matrix.one(@number_of_training_examples, 1), feature_set)

  # prepare theta column vector with zeros
  theta = Matrix.zero(@number_of_features+1, 1)

  iterations.times do        
    theta = theta - (learning_rate * (1.0/@number_of_training_examples) * (feature_set * theta - @output_set).transpose * feature_set).transpose
    if debug
      puts "Theta: #{theta}"
      puts "Cost: #{calculate_cost(feature_set, theta)}"
    end
  end

  # normalize prediction
  prediction = normalize_prediction(prediction)

  # add column of ones to prediction
  prediction = Matrix.hconcat(Matrix.one(prediction.row_size, 1), prediction)

  result_vectorized = prediction * theta
  result = result_vectorized[0,0]
end

Private Instance Methods

calculate_cost(feature_set, theta) click to toggle source

Calculates cost of current theta. The closer to 0, the more accurate the prediction will be. @param feature_set [Matrix] feature set @param theta [Matrix] theta @return [Float] cost

# File lib/supervised_learning.rb, line 144
def calculate_cost(feature_set, theta)
  cost_vectorized = 1.0/(2 * @number_of_training_examples) * (feature_set * theta - @output_set).transpose * (feature_set * theta - @output_set)
  cost_vectorized[0,0]
end
normalize_feature_set(feature_set) click to toggle source

Normalizes feature set for quicker and more reliable calculation. @param feature_set [Matrix] feature set @return [Matrix] normalized feature set

# File lib/supervised_learning.rb, line 99
def normalize_feature_set(feature_set)
  # get mean for each column
  mean = []
  feature_set.column_vectors.each do |feature_set_column|
    mean << feature_set_column.mean
  end          
  # convert Array into Matrix of same dimension as feature_set for substraction
  # save for later usage
  @mean = Matrix[mean].vcopy(@number_of_training_examples - 1)

  # substract mean from feature set
  feature_set = feature_set - @mean

  # get standard deviation for each column
  standard_deviation = []
  feature_set.column_vectors.each do |feature_set_column|
    standard_deviation << feature_set_column.standard_deviation
  end
  # convert Array into Matrix of same dimension as feature_set for substraction
  # save for later usage
  @standard_deviation = Matrix[standard_deviation].vcopy(@number_of_training_examples - 1)

  # divide feature set by standard deviation
  feature_set = feature_set.element_division @standard_deviation
end
normalize_prediction(prediction) click to toggle source

Normalizes prediction. @param prediction [Matrix] prediction @return [Matrix] normalized prediction

# File lib/supervised_learning.rb, line 128
def normalize_prediction(prediction)      
  # convert prediction into Matrix of same dimension as @mean for substraction
  prediction = prediction.vcopy(@number_of_training_examples - 1)

  # substract mean
  prediction = prediction - @mean

  # divide feature set by standard deviation
  prediction = prediction.element_division @standard_deviation 
end
validate_prediction_input(prediction) click to toggle source

Validates prediction input. @param prediction [Matrix] prediction @raise [ArgumentError] if prediction is not a Matrix @raise [ArgumentError] if prediction does not have the correct number of columns (@training set minus @output_set) @raise [ArgumentError] if prediction has more than one row

# File lib/supervised_learning.rb, line 90
def validate_prediction_input(prediction)
  raise ArgumentError, 'input is not a Matrix' unless prediction.is_a? Matrix
  raise ArgumentError, 'input has more than one row' if prediction.row_size > 1
  raise ArgumentError, 'input has wrong number of columns' if prediction.column_size != @number_of_features 
end