class TfIdf::Ja
日本語辞書を使用してTFIDFを算出するクラス
Public Class Methods
new()
click to toggle source
コンストラクタ
# File lib/tfidf_ja.rb, line 16 def initialize @idfs = load_dic reset end
Public Instance Methods
idf(word)
click to toggle source
IDFを取得する
- word
-
形態素
- return
-
IDF
# File lib/tfidf_ja.rb, line 41 def idf(word) idf = @idfs.get(word) if(idf.nil?) idf = @idfs.average end return idf end
reset()
click to toggle source
インスタンスのリセット
# File lib/tfidf_ja.rb, line 22 def reset @tfs = {} end
tfidf(words)
click to toggle source
TF-IDFを算出する
- words
-
形態素配列
- return
-
key = 形態素、value = TF-IDF値のハッシュテーブル
# File lib/tfidf_ja.rb, line 29 def tfidf(words) tfidfs = {} set_tf_map(words) @tfs.each_pair { |word, tf| tfidfs[word] = tf * idf(word) } return tfidfs end
Private Instance Methods
load_dic()
click to toggle source
辞書ファイルを読み込む
# File lib/tfidf_ja.rb, line 52 def load_dic idf_dic = File.dirname(__FILE__) + "/../dic/#{Version.ruby}/idf.dic" File.open(idf_dic) { |f| return Marshal.load(f) } end
set_tf_map(words)
click to toggle source
TF値を計算する
- words
-
形態素配列
- return
-
keyが形態素、valueがTF値のハッシュテーブル
# File lib/tfidf_ja.rb, line 62 def set_tf_map(words) words.each { |word| if(@tfs.key?(word)) @tfs[word] += 1 else @tfs[word] = 1 end } end