class Spark::Mllib::NaiveBayesModel
Model for Naive Bayes classifiers.
Contains two parameters:
- pi
-
vector of logs of class priors (dimension C)
- theta
-
matrix of logs of class conditional probabilities (CxD)
Examples:¶ ↑
Spark::Mllib.import # Dense vectors data = [ LabeledPoint.new(0.0, [0.0, 0.0]), LabeledPoint.new(0.0, [0.0, 1.0]), LabeledPoint.new(1.0, [1.0, 0.0]) ] model = NaiveBayes.train($sc.parallelize(data)) model.predict([0.0, 1.0]) # => 0.0 model.predict([1.0, 0.0]) # => 1.0 # Sparse vectors data = [ LabeledPoint.new(0.0, SparseVector.new(2, {1 => 0.0})), LabeledPoint.new(0.0, SparseVector.new(2, {1 => 1.0})), LabeledPoint.new(1.0, SparseVector.new(2, {0 => 1.0})) ] model = NaiveBayes.train($sc.parallelize(data)) model.predict(SparseVector.new(2, {1 => 1.0})) # => 0.0 model.predict(SparseVector.new(2, {0 => 1.0})) # => 1.0
Attributes
labels[R]
pi[R]
theta[R]
Public Class Methods
new(labels, pi, theta)
click to toggle source
# File lib/spark/mllib/classification/naive_bayes.rb, line 47 def initialize(labels, pi, theta) @labels = labels @pi = pi @theta = theta end
Public Instance Methods
predict(vector)
click to toggle source
Predict values for a single data point or an RDD
of points using the model trained.
# File lib/spark/mllib/classification/naive_bayes.rb, line 55 def predict(vector) vector = Spark::Mllib::Vectors.to_vector(vector) array = (vector.dot(theta) + pi).to_a index = array.index(array.max) labels[index] end