class Spark::Mllib::NaiveBayesModel

NaiveBayesModel

Model for Naive Bayes classifiers.

Contains two parameters:

pi

vector of logs of class priors (dimension C)

theta

matrix of logs of class conditional probabilities (CxD)

Examples:

Spark::Mllib.import

# Dense vectors
data = [
  LabeledPoint.new(0.0, [0.0, 0.0]),
  LabeledPoint.new(0.0, [0.0, 1.0]),
  LabeledPoint.new(1.0, [1.0, 0.0])
]
model = NaiveBayes.train($sc.parallelize(data))

model.predict([0.0, 1.0])
# => 0.0
model.predict([1.0, 0.0])
# => 1.0

# Sparse vectors
data = [
  LabeledPoint.new(0.0, SparseVector.new(2, {1 => 0.0})),
  LabeledPoint.new(0.0, SparseVector.new(2, {1 => 1.0})),
  LabeledPoint.new(1.0, SparseVector.new(2, {0 => 1.0}))
]
model = NaiveBayes.train($sc.parallelize(data))

model.predict(SparseVector.new(2, {1 => 1.0}))
# => 0.0
model.predict(SparseVector.new(2, {0 => 1.0}))
# => 1.0

Attributes

labels[R]
pi[R]
theta[R]

Public Class Methods

new(labels, pi, theta) click to toggle source
# File lib/spark/mllib/classification/naive_bayes.rb, line 47
def initialize(labels, pi, theta)
  @labels = labels
  @pi = pi
  @theta = theta
end

Public Instance Methods

predict(vector) click to toggle source

Predict values for a single data point or an RDD of points using the model trained.

# File lib/spark/mllib/classification/naive_bayes.rb, line 55
def predict(vector)
  vector = Spark::Mllib::Vectors.to_vector(vector)
  array = (vector.dot(theta) + pi).to_a
  index = array.index(array.max)
  labels[index]
end