class Company::Mapping::InverseDocumentFrequency

InverseDocumentFrequency consists the basic implementation of inverse document frequency. It is the logarithmically scaled inverse fraction of the documents that contain the token, obtained by dividing the total number of documents by the number of documents containing the token, and then taking the logarithm of that quotient.

Public Class Methods

new(corpus) click to toggle source
# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 9
def initialize(corpus)
  @corpus = corpus
end

Public Instance Methods

calculate() click to toggle source

Calculates the basic Inverse Document Frequency of each token contained in a corpus of documents.

# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 14
def calculate
  document_frequency.each_with_object({}) do |(word, freq), idf|
    idf[word] = Math.log(@corpus.size/freq)
  end
end
maxIDF() click to toggle source
# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 20
def maxIDF
  Math.log(@corpus.size * 1.0)
end

Protected Instance Methods

document_frequency() click to toggle source

calculates the number of document occurrences of unique tokens within a corpus

# File lib/company/mapping/tfidf/idf/inverse_document_frequency.rb, line 27
def document_frequency
  @corpus.each_with_object({}) do |doc, df|
    doc.bag_of_words.keys.each do |word|
      df[word] = (df.fetch(word) { 0.0 }) + 1.0
    end
  end
end