SC_AMKM {SCDA}R Documentation

Spatial Clustering for sf data

Description

Perform spatial clustering using K-means and AMKM (Adjacent Matrix K-Means) algorithms on sf data.

Usage

SC_AMKM(
  Data_sf,
  IndexCol,
  Method,
  Distance = "euclidean",
  MinNc = 2,
  MaxNc = 10,
  Metric = "silhouette",
  RidDim = "pca",
  CenterVars = T,
  ScaleVars = T,
  MakePlot = T,
  ExplainedVariance = 0.9,
  KeepCoord = T,
  Seed = 123456789,
  Verbose = T,
  CRS = 4326
)

Arguments

Data_sf

A data.frame object of class sf with n rows (each one corresponding to a location) and a user-defined number of columns. It must include the geometry feature for spatial modelling and representation. Typically, sf data.frame are built using the st_as_sf(...) command from the sf package (see its documentation for details).

IndexCol

Integer value. Number of the dataset ID column. If there isn't an ID column IndexCol=0.

Method

Character. Must be one of: 'AMKM' or 'K-means'. If method='AMKM', the Adjacent Matrix K-Means clustering is performed. If method='K-means', K-means clustering is performed.

Distance

Character. The distance measure to be used to compute the dissimilarity matrix. This must be one of: "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski". By default, distance='euclidean'.

MinNc

Integer value. Minimal number of clusters, between 1 and (number of objects - 1). Default is MinNc=2.

MaxNc

Integer value. Maximal number of clusters, between 2 and (number of objects - 1), greater or equal to MinNc. Default is MaxNc=10.

Metric

Character. The validation index to be calculated for the selection of the optimal clustering partition. This should be one of : "kl", "ch", "hartigan", "ccc", "scott", "marriot", "trcovw", "tracew", "friedman", "rubin", "cindex", "db", "silhouette", "duda", "pseudot2", "beale", "ratkowsky", "ball", "ptbiserial", "gap", "frey", "mcclain", "gamma", "gplus", "tau", "dunn", "hubert", "sdindex", "dindex", "sdbw", "all" (all indices except GAP, Gamma, Gplus and Tau), "alllong" (all indices with Gap, Gamma, Gplus and Tau included). Default is Metric='silhouette'.

RidDim

Character.The dimensionality reduction method. This should be one of : 'pca' or 'laplacian'. if 'RidDim='pca”, a principal component analysis is performed. if 'RidDim='laplacian” the laplacian matrix dimensionality reduction method is performed . Default is RidDim='pca'.

CenterVars

Logical value (TRUE or FALSE) stating whether the features have to be centered around the mean. Default is TRUE.

ScaleVars

Logical value (TRUE or FALSE) stating whether the features have to be scaled with respect to their standard deviation. Default is TRUE.

MakePlot

Logical value (TRUE or FALSE) stating whether the plot must be displayed. Default is TRUE.

ExplainedVariance

numeric. cumulate percentage of the variance explained by the eigenvalues of the dimesionality reduction method. Must be between 0 and 1. Default is ExplainedVariance=0.9.

KeepCoord

Logical value (TRUE or FALSE) stating whether the coordinate must be taken into account in K-means algorithm. Available only when 'method='K-means”. Default is TRUE.

Seed

Integer value. Define the random number generator (RNG) state for random number generation in R. Deafult is seed = 123456789.

Verbose

Logical value (TRUE or FALSE). Toggle warnings and messages. If verbose = TRUE (default) the function prints on the screen some messages describing the progress of the tasks. If verbose = FALSE any message about the progression is suppressed. Default is TRUE.

CRS

Integer value. Coordinate reference system. something suitable as input to st_crs.command from the sf package (see its documentation for details). Default is CRS=4326

Details

AMKM calculations is done decomposing the input dataset in two subset. The first one contains the features while the second one contains the coordinates (longitude and latitude). A dissimilarity matrix is calculated on both subset using the parameter distance for the feature and the Great Circle distance for coordinates. Then an adjacent matrix (n x n) is computed on every dissimilarity matrix using gaussian kernel. To reduce the dimensionality of the adjacent matrix a dimentionality reduction method is necessary (see RidDim param. for more) K-means is applied with no modification at its original algorithm.

Value

A list object containing the following outputs:

Author(s)

Camilla Lionetti <lionetticamilla511@gmail.com>, Francesco Caccia <francesco.caccia2000@gmail.com>

Examples

library(sp)
library(sf)
data("meuse")
dati<-meuse
dati<-subset(dati,select=sapply(dati,is.numeric))
dati<-st_as_sf(dati, coords = c("x", "y"),crs =28992)
SC <- SC_AMKM(Data_sf=dati,IndexCol=0, Method="AMKM",MinNc = 5,MaxNc = 5 ,CRS=28992)


[Package SCDA version 0.0.2 Index]