distances_omv {jmvReadWrite}R Documentation

Calculates distances (returning a symmetric matrix) from a raw data matrix in .omv-files for the statistical spreadsheet 'jamovi' (https://www.jamovi.org)

Description

Calculates distances (returning a symmetric matrix) from a raw data matrix in .omv-files for the statistical spreadsheet 'jamovi' (https://www.jamovi.org)

Usage

distances_omv(
  dtaInp = NULL,
  fleOut = "",
  varDst = c(),
  clmDst = TRUE,
  stdDst = "none",
  nmeDst = "euclid",
  mtxSps = FALSE,
  mtxTrL = FALSE,
  mtxDgn = TRUE,
  usePkg = c("foreign", "haven"),
  selSet = "",
  ...
)

Arguments

dtaInp

Either a data frame or the name of a data file to be read (including the path, if required; "FILENAME.ext"; default: NULL); files can be of any supported file type, see Details below.

fleOut

Name of the data file to be written (including the path, if required; "FILE_OUT.omv"; default: ""); if empty, the resulting data frame is returned instead.

varDst

Variable (default: c()) containing a character vector with the names of the variables for which distances are to be calculated. See Details for more information.

clmDst

Whether the distances shall be calculated between columns (TRUE) or rows (FALSE; default: TRUE). See Details for more information.

stdDst

Character string indicating whether the variables in varDst are to be standardized and how (default: "none"). See Details for more information.

nmeDst

Character string indicating which distance measure is to be calculated calculated (default: "euclidean"). See Details for more information.

mtxSps

Whether the symmetric matrix to be returned should be sparse (default: FALSE)

mtxTrL

Whether the symmetric matrix to be returned should only contain the lower triangular (default: FALSE)

mtxDgn

Whether the symmetric matrix to be returned should retain the values in the main diagonal (default: TRUE)

usePkg

Name of the package: "foreign" or "haven" that shall be used to read SPSS, Stata, and SAS files; "foreign" is the default (it comes with base R), but "haven" is newer and more comprehensive.

selSet

Name of the data set that is to be selected from the workspace (only applies when reading .RData-files)

...

Additional arguments passed on to methods; see Details below.

Details

Value

a data frame containing a symmetric matrix (only returned if fleOut is empty) containing the distances between the variables / columns (clmDst == TRUE) or rows (clmDst == FALSE)

See Also

distances_omv internally uses the following function for calculating the distances for interval data stats::dist(). It furthermore uses the following functions for reading and writing data files in different formats: read_omv() and write_omv() for jamovi-files, utils::read.table() for CSV / TSV files, load() for reading .RData-files, readRDS() for .rds-files, haven::read_sav() or foreign::read.spss() for SPSS-files, haven::read_dta() or foreign::read.dta() for Stata-files, haven::read_sas() for SAS-data-files, and haven::read_xpt() or foreign::read.xport() for SAS-transport-files.

Examples

## Not run: 
# create matrices for the different types of distance measures: continuous
# (cntFrm), frequency counts (frqFrm) or binary (binFrm); all 20 R x 5 C
set.seed(1)
cntFrm <- stats::setNames(as.data.frame(matrix(rnorm(100, sd = 10),
            ncol = 5)), sprintf("C_%02d", seq(5)))
frqFrm <- stats::setNames(as.data.frame(matrix(sample(seq(10), 100,
            replace = TRUE), ncol = 5)), sprintf("F_%02d", seq(5)))
binFrm <- stats::setNames(as.data.frame(matrix(sample(c(TRUE, FALSE), 100,
            replace = TRUE), ncol = 5)), sprintf("B_%02d", seq(5)))
nmeOut <- tempfile(fileext = ".omv")

# calculates the distances between columns, nmeDst is not required: "euclid"
# is the default
jmvReadWrite::distances_omv(dtaInp = cntFrm, fleOut = nmeOut, varDst =
  names(cntFrm), nmeDst = "euclid")
dtaFrm <- jmvReadWrite::read_omv(nmeOut)
unlink(nmeOut)
# the resulting matrix (10 x 10) with the Euclidian distances
print(dtaFrm)

# calculates the (Euclidean) distances between rows (clmDst = FALSE)
jmvReadWrite::distances_omv(dtaInp = cntFrm, fleOut = nmeOut, varDst =
  names(cntFrm), clmDst = FALSE, nmeDst = "euclid")
dtaFrm <- jmvReadWrite::read_omv(nmeOut)
unlink(nmeOut)
# the resulting matrix (20 x 20) with the Euclidian distances
print(dtaFrm)

# calculates the (Euclidean) distances between columns; the original data
# are z-standardized before calculating the distances (stdDst = "z")
jmvReadWrite::distances_omv(dtaInp = cntFrm, fleOut = nmeOut, varDst =
  names(cntFrm), stdDst = "z", nmeDst = "euclid")
dtaFrm <- jmvReadWrite::read_omv(nmeOut)
unlink(nmeOut)
# the resulting matrix (10 x 10) with the Euclidian distances using the
# z-standardized data
print(dtaFrm)

# calculates the correlations between columns
jmvReadWrite::distances_omv(dtaInp = cntFrm, fleOut = nmeOut, varDst =
  names(cntFrm), nmeDst = "correlation")
dtaFrm <- jmvReadWrite::read_omv(nmeOut)
unlink(nmeOut)
# the resulting matrix (10 x 10) with the correlations
print(dtaFrm)

# calculates the chi-square dissimilarity (nmeDst = "chisq") between columns
jmvReadWrite::distances_omv(dtaInp = frqFrm, fleOut = nmeOut, varDst =
  names(frqFrm), nmeDst = "chisq")
dtaFrm <- jmvReadWrite::read_omv(nmeOut)
unlink(nmeOut)
# the resulting matrix (10 x 10) with the chi-square dissimilarities
print(dtaFrm)

# calculates the Jaccard similarity (nmeDst = "jaccard") between columns
jmvReadWrite::distances_omv(dtaInp = binFrm, fleOut = nmeOut, varDst =
  names(binFrm), nmeDst = "jaccard")
dtaFrm <- jmvReadWrite::read_omv(nmeOut)
unlink(nmeOut)
# the resulting matrix (10 x 10) with the Jaccard similarities
print(dtaFrm)


## End(Not run)


[Package jmvReadWrite version 0.4.11 Index]