class Spark::Mllib::NaiveBayes
Public Class Methods
train(rdd, lambda=1.0)
click to toggle source
Trains a Naive Bayes model given an RDD
of (label, features) pairs.
This is the Multinomial NB (tinyurl.com/lsdw6p) which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB (tinyurl.com/p7c96j6). The input feature values must be nonnegative.
Arguments:¶ ↑
- rdd
-
RDD
ofLabeledPoint
. - lambda
-
The smoothing parameter.
# File lib/spark/mllib/classification/naive_bayes.rb, line 82 def self.train(rdd, lambda=1.0) # Validation first = rdd.first unless first.is_a?(LabeledPoint) raise Spark::MllibError, "RDD should contains LabeledPoint, got #{first.class}" end labels, pi, theta = Spark.jb.call(RubyMLLibAPI.new, 'trainNaiveBayesModel', rdd, lambda) theta = Spark::Mllib::Matrices.dense(theta.size, theta.first.size, theta) NaiveBayesModel.new(labels, pi, theta) end