class Company::Mapping::InverseDocumentFrequency
InverseDocumentFrequency
consists the basic implementation of inverse document frequency. It is the logarithmically scaled inverse fraction of the documents that contain the token, obtained by dividing the total number of documents by the number of documents containing the token, and then taking the logarithm of that quotient.
Public Class Methods
new(corpus)
click to toggle source
# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 9 def initialize(corpus) @corpus = corpus end
Public Instance Methods
calculate()
click to toggle source
Calculates the basic Inverse Document Frequency of each token contained in a corpus of documents.
# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 14 def calculate document_frequency.each_with_object({}) do |(word, freq), idf| idf[word] = Math.log(@corpus.size/freq) end end
maxIDF()
click to toggle source
# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 20 def maxIDF Math.log(@corpus.size * 1.0) end
Protected Instance Methods
document_frequency()
click to toggle source
calculates the number of document occurrences of unique tokens within a corpus
# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 27 def document_frequency @corpus.each_with_object({}) do |doc, df| doc.bag_of_words.keys.each do |word| df[word] = (df.fetch(word) { 0.0 }) + 1.0 end end end