utils_cluster_silhouette {distantia}R Documentation

Compute Silhouette Width of a Clustering Solution

Description

The silhouette width is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation).

There are some general guidelines to interpret the individual silhouette widths of the clustered objects (as returned by this function when mean = FALSE):

When mean = TRUE, the overall mean of the silhouette widths of all objects is returned. These values should be interpreted as follows:

This metric may not perform well when the clusters have irregular shapes or sizes.

This code was adapted from https://svn.r-project.org/R-packages/trunk/cluster/R/silhouette.R.

Usage

utils_cluster_silhouette(labels = NULL, d = NULL, mean = FALSE)

Arguments

labels

(required, integer vector) Labels resulting from a clustering algorithm applied to d. Must have the same length as columns and rows in d. Default: NULL

d

(required, matrix) distance matrix typically resulting from distantia_matrix(), but any other square matrix should work. Default: NULL

mean

(optional, logical) If TRUE, the mean of the silhouette widths is returned. Default: FALSE

Value

data frame

See Also

Other distantia_support: distantia_aggregate(), distantia_boxplot(), distantia_cluster_hclust(), distantia_cluster_kmeans(), distantia_matrix(), distantia_model_frame(), distantia_spatial(), distantia_stats(), distantia_time_delay(), utils_block_size(), utils_cluster_hclust_optimizer(), utils_cluster_kmeans_optimizer()

Examples

#weekly covid prevalence in three California counties
#load as tsl
#subset first 10 time series
#sum by month
tsl <- tsl_initialize(
  x = covid_prevalence,
  name_column = "name",
  time_column = "time"
) |>
  tsl_subset(
    names = 1:10
  ) |>
  tsl_aggregate(
    new_time = "months",
    method = max
  )

#compute dissimilarity
distantia_df <- distantia(
  tsl = tsl,
  lock_step = TRUE
)

#generate dissimilarity matrix
psi_matrix <- distantia_matrix(
  df = distantia_df
)

#example with kmeans clustering
#------------------------------------

#kmeans with 3 groups
psi_kmeans <- stats::kmeans(
  x = as.dist(psi_matrix[[1]]),
  centers = 3
)

#case-wise silhouette width
utils_cluster_silhouette(
  labels = psi_kmeans$cluster,
  d = psi_matrix
)

#overall silhouette width
utils_cluster_silhouette(
  labels = psi_kmeans$cluster,
  d = psi_matrix,
  mean = TRUE
)


#example with hierarchical clustering
#------------------------------------

#hierarchical clustering
psi_hclust <- stats::hclust(
  d = as.dist(psi_matrix[[1]])
)

#generate labels for three groups
psi_hclust_labels <- stats::cutree(
  tree = psi_hclust,
  k = 3,
)

#case-wise silhouette width
utils_cluster_silhouette(
  labels = psi_hclust_labels,
  d = psi_matrix
)

#overall silhouette width
utils_cluster_silhouette(
  labels = psi_hclust_labels,
  d = psi_matrix,
  mean = TRUE
)

[Package distantia version 2.0.2 Index]