get_svd {nlpembeds} | R Documentation |
Compute random singular value decomposition (rSVD)
Description
Random SVD is an efficient approximation of truncated SVD, in which only the first principal components are returned. It is computed with the rsvd package, and the author suggests that the number of dimensions requested k should be: k < n / 4, where n is the number of features, for it to be efficient, and that otherwise one should rather use either SVD or truncated SVD. When computing SVD on PMI, we only want to use the singular values corresponding to the positive eigen values. We do not know beforehand how many we will have to filter out, so there is two parameters: 'embedding_dim' for the requested output dimension, and 'svd_rank' for the actual SVD computation, by default twice the requested dimension, and a warning may be thrown if 'svd_rank' needs to be manually increased. Computation may be expensive and manually optimizing the 'svd_rank' parameter might save significant time.
Usage
get_svd(m_pmi, embedding_dim = 100, svd_rank = embedding_dim * 2)
Arguments
m_pmi |
Pointwise mutual information matrix. |
embedding_dim |
Number of output embedding dimensions requested. |
svd_rank |
Number of SVD dimensions to compute. |
Value
SVD rectangular matrix
Examples
df_ehr = data.frame(Patient = c(1, 1, 2, 1, 2, 1, 1, 3, 4),
Month = c(1, 1, 1, 2, 2, 3, 3, 4, 4),
Parent_Code = c('C1', 'C2', 'C2', 'C1', 'C1', 'C1',
'C2', 'C3', 'C4'),
Count = 1:9)
spm_cooc = build_df_cooc(df_ehr)
m_pmi = get_pmi(spm_cooc)
m_svd = get_svd(m_pmi, embedding_dim = 2)