class Company::Mapping::CosineSimilarity

Implements Cosine Similarity between two non zero vectors and it measures the cosine of the angle between them.

Public Instance Methods

calculate(doc1, doc2) click to toggle source

Calculates cosine similarity between two documents. The documents are expressed as vectors of tokens (bag of words model).

# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 8
def calculate(doc1, doc2)
  (dotProduct(doc1, doc2) / (Math.sqrt(d(doc1)) * Math.sqrt(d(doc2)))).round(4)
end

Protected Instance Methods

common_tokens(doc1_tokens, doc2_tokens) click to toggle source

returns the set of common tokens between two document vectors

# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 29
def common_tokens(doc1_tokens, doc2_tokens)
  common_tokens = Set.new doc1_tokens
  common_tokens.intersection(Set.new doc2_tokens)
end
d(doc) click to toggle source

Calculates the magnitude of a vector document

# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 22
def d(doc)
  doc.keys.inject(0.0) do |d, term|
    d + doc[term]**2.0
  end
end
dotProduct(doc1, doc2) click to toggle source

Calculated the dot product between the two document vectors. The dot product is an algebraic operation

that takes two equal-length sequences of numbers (usually coordinate vectors) and returns a single number.
# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 15
def dotProduct(doc1, doc2)
  common_tokens(doc1.keys, doc2.keys).inject(0.0) do |dot_product, token|
    dot_product + doc2[token] * doc1[token]
  end
end