centMode {flexord}R Documentation

Centroid Functions for K-Centroids Clustering of (Ordinal) Categorical/Mixed Data

Description

Functions to calculate cluster centroids for K-centroids clustering that extend the options available in package flexclust.

centMode calculates centroids based on the mode of each variable. centMin determines centroids within a specified range which minimize the supplied distance metric. centOptimNA replicates the behaviour of flexclust::centOptim() but removes missing values.

These functions are designed for use with flexclust::kcca() or functions that are built upon it. Their use is easiest via the wrapper kccaExtendedFamily().

Usage

centMode(x)

centMin(x, dist, xrange = NULL)

centOptimNA(x, dist)

Arguments

x

A numeric matrix or data frame.

dist

The distance measure function used in centMin and centOptimNA.

xrange

The range of the data in x. Currently only used for centMin. Options are:

  • NULL (default): defaults to "all".

  • "all": uses the same minimum and maximum value for each column of x by determining the whole range of values in the data object x.

  • "columnwise": uses different minimum and maximum values for each column of x by determining the columnwise ranges of values in the data object x.

  • A vector of c(min, max): specifies the same minimum and maximum value for each column of x.

  • A list of vectors list(c(min1, max1), c(min2, max2),...) with length ncol(x): specifies different minimum and maximum values for each column of x.

Details

Value

A named numeric vector containing the centroid values for each column of x.

See Also

kccaExtendedFamily(), flexclust::kcca()

Examples

# Example: Mode as centroid
dat <- data.frame(A = rep(2:5, 2),
                  B = rep(1:4, 2),
                  C = rep(c(1, 2, 4, 5), 2))
centMode(dat)
## within kcca
flexclust::kcca(dat, 3, family=kccaExtendedFamily('kModes')) #default centroid

# Example: Centroid is level for which distance is minimal
centMin(dat, flexclust::distManhattan, xrange = 'all')
## within kcca
flexclust::kcca(dat, 3,
                family=flexclust::kccaFamily(dist=flexclust::distManhattan,
                                             cent=\(y) centMin(y, flexclust::distManhattan,
                                                               xrange='all')))
                             
# Example: Centroid calculated by general purpose optimizer with NA removal
nas <- sample(c(TRUE, FALSE), prod(dim(dat)),
              replace=TRUE, prob=c(0.1,0.9)) |> 
       matrix(nrow=nrow(dat))
dat[nas] <- NA
centOptimNA(dat, flexclust::distManhattan)
## within kcca
flexclust::kcca(dat, 3, family=kccaExtendedFamily('kGower')) #default centroid

[Package flexord version 1.0.0 Index]