class Spark::Mllib::LinearRegressionModel
Train a linear regression model with no regularization using Stochastic Gradient Descent. This solves the least squares regression formulation
f(weights) = 1/n ||A weights-y||^2^
(which is the mean squared error). Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation.
Examples:¶ ↑
Spark::Mllib.import # Dense vectors data = [ LabeledPoint.new(0.0, [0.0]), LabeledPoint.new(1.0, [1.0]), LabeledPoint.new(3.0, [2.0]), LabeledPoint.new(2.0, [3.0]) ] lrm = LinearRegressionWithSGD.train($sc.parallelize(data), initial_weights: [1.0]) lrm.intercept # => 0.0 lrm.weights # => [0.9285714285714286] lrm.predict([0.0]) < 0.5 # => true lrm.predict([1.0]) - 1 < 0.5 # => true lrm.predict(SparseVector.new(1, {0 => 1.0})) - 1 < 0.5 # => true # Sparse vectors data = [ LabeledPoint.new(0.0, SparseVector.new(1, {0 => 0.0})), LabeledPoint.new(1.0, SparseVector.new(1, {0 => 1.0})), LabeledPoint.new(3.0, SparseVector.new(1, {0 => 2.0})), LabeledPoint.new(2.0, SparseVector.new(1, {0 => 3.0})) ] lrm = LinearRegressionWithSGD.train($sc.parallelize(data), initial_weights: [1.0]) lrm.intercept # => 0.0 lrm.weights # => [0.9285714285714286] lrm.predict([0.0]) < 0.5 # => true lrm.predict(SparseVector.new(1, {0 => 1.0})) - 1 < 0.5 # => true