centMode {flexord} | R Documentation |
Centroid Functions for K-Centroids Clustering of (Ordinal) Categorical/Mixed Data
Description
Functions to calculate cluster centroids for K-centroids clustering that extend the options available in package flexclust.
centMode
calculates centroids based on the mode of each variable.
centMin
determines centroids within a specified range which
minimize the supplied distance metric. centOptimNA
replicates
the behaviour of flexclust::centOptim()
but removes missing
values.
These functions are designed for use with flexclust::kcca()
or
functions that are built upon it. Their use is easiest via the
wrapper kccaExtendedFamily()
.
Usage
centMode(x)
centMin(x, dist, xrange = NULL)
centOptimNA(x, dist)
Arguments
x |
A numeric matrix or data frame. |
dist |
The distance measure function used in |
xrange |
The range of the data in
|
Details
-
centMode
: Column-wise modes are used as centroids, and ties are broken randomly. In combination with Simple Matching Distance (distSimMatch
), this results in thekmodes
algorithm. -
centMin
: Column-wise centroids are calculated by minimizing the specified distance measure between the values inx
and all possible levels ofx
. -
centOptimNA
: Column-wise centroids are calculated by minimizing the specified distance measure via a general purpose optimizer. Unlike inflexclust::centOptim()
, NAs are removed from the starting search values and disregarded in the distance calculation.
Value
A named numeric vector containing the centroid values for each column of x
.
See Also
kccaExtendedFamily()
,
flexclust::kcca()
Examples
# Example: Mode as centroid
dat <- data.frame(A = rep(2:5, 2),
B = rep(1:4, 2),
C = rep(c(1, 2, 4, 5), 2))
centMode(dat)
## within kcca
flexclust::kcca(dat, 3, family=kccaExtendedFamily('kModes')) #default centroid
# Example: Centroid is level for which distance is minimal
centMin(dat, flexclust::distManhattan, xrange = 'all')
## within kcca
flexclust::kcca(dat, 3,
family=flexclust::kccaFamily(dist=flexclust::distManhattan,
cent=\(y) centMin(y, flexclust::distManhattan,
xrange='all')))
# Example: Centroid calculated by general purpose optimizer with NA removal
nas <- sample(c(TRUE, FALSE), prod(dim(dat)),
replace=TRUE, prob=c(0.1,0.9)) |>
matrix(nrow=nrow(dat))
dat[nas] <- NA
centOptimNA(dat, flexclust::distManhattan)
## within kcca
flexclust::kcca(dat, 3, family=kccaExtendedFamily('kGower')) #default centroid