class Company::Mapping::CosineSimilarity
Implements Cosine Similarity between two non zero vectors and it measures the cosine of the angle between them.
Public Instance Methods
calculate(doc1, doc2)
click to toggle source
Calculates cosine similarity between two documents. The documents are expressed as vectors of tokens (bag of words model).
# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 8 def calculate(doc1, doc2) (dotProduct(doc1, doc2) / (Math.sqrt(d(doc1)) * Math.sqrt(d(doc2)))).round(4) end
Protected Instance Methods
common_tokens(doc1_tokens, doc2_tokens)
click to toggle source
returns the set of common tokens between two document vectors
# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 29 def common_tokens(doc1_tokens, doc2_tokens) common_tokens = Set.new doc1_tokens common_tokens.intersection(Set.new doc2_tokens) end
d(doc)
click to toggle source
Calculates the magnitude of a vector document
# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 22 def d(doc) doc.keys.inject(0.0) do |d, term| d + doc[term]**2.0 end end
dotProduct(doc1, doc2)
click to toggle source
Calculated the dot product between the two document vectors. The dot product is an algebraic operation
that takes two equal-length sequences of numbers (usually coordinate vectors) and returns a single number.
# File lib/company/mapping/vector_similarity/cosine_similarity.rb, line 15 def dotProduct(doc1, doc2) common_tokens(doc1.keys, doc2.keys).inject(0.0) do |dot_product, token| dot_product + doc2[token] * doc1[token] end end