similarity_matrix {keyclust}R Documentation

Algorithm designed to create a cosine similarity matrix from a fitted word embedding model

Description

This function takes a fitted word embedding model and computes the cosine similarity between each word.

Usage

similarity_matrix(x, words = NULL, max_terms = 25000)

Arguments

x

A word embedding matrix

words

A vector of words or the name of a column that corresponds to the word dimension of the fitted word embeddings

max_terms

The maximum number of embedding terms that will be included in output similarity matrix. Assumes that embedding input is ordered by word frequency.

Value

An N x N matrix of cosine similarity scores between words from a fitted word embedding model.

Examples

# Create a set of keywords using a pre-defined set of seeds
simmat <- similarity_matrix(wordemb_FasttextEng_sample, words = "words")

[Package keyclust version 1.2.5 Index]