class TfIdf::Ja

日本語辞書を使用してTFIDFを算出するクラス

Public Class Methods

new() click to toggle source

コンストラクタ

# File lib/tfidf_ja.rb, line 16
def initialize
  @idfs = load_dic
  reset
end

Public Instance Methods

idf(word) click to toggle source

IDFを取得する

word

形態素

return

IDF

# File lib/tfidf_ja.rb, line 41
def idf(word)
  idf = @idfs.get(word)
  if(idf.nil?)
    idf = @idfs.average
  end
  return idf
end
reset() click to toggle source

インスタンスのリセット

# File lib/tfidf_ja.rb, line 22
def reset
  @tfs = {}
end
tfidf(words) click to toggle source

TF-IDFを算出する

words

形態素配列

return

key = 形態素、value = TF-IDF値のハッシュテーブル

# File lib/tfidf_ja.rb, line 29
def tfidf(words)
  tfidfs = {}
  set_tf_map(words)
  @tfs.each_pair { |word, tf|
    tfidfs[word] = tf * idf(word)
  }
  return tfidfs
end

Private Instance Methods

load_dic() click to toggle source

辞書ファイルを読み込む

# File lib/tfidf_ja.rb, line 52
def load_dic
  idf_dic = File.dirname(__FILE__) + "/../dic/#{Version.ruby}/idf.dic"
  File.open(idf_dic) { |f|
    return Marshal.load(f)
  }
end
set_tf_map(words) click to toggle source
TF値を計算する
words

形態素配列

return

keyが形態素、valueがTF値のハッシュテーブル

# File lib/tfidf_ja.rb, line 62
def set_tf_map(words)
  words.each { |word|
    if(@tfs.key?(word))
      @tfs[word] += 1
    else
      @tfs[word] = 1
    end
  }
end