Random.hclustering.validity {clusterv} | R Documentation |
Random hierarchical clustering and validity index computation using random projections of data.
Description
This function applies a hierarchical clustering algorithm to the data and then computes stability indices for the obtained cluster using multiple random subspace projections. The reliability of clusters discovered by a hierarchical clustering algorithm is assessed using randomized projections. The validity indices for each individual cluster, the overall validity index of the clustering and the AC indices are computed. Different hierarchical clusterings may be used (e.g. average, complete and single linkage or the Ward's method) as well as different randomized maps (e.g. PMO, Achlioptas, Normal, Random Subspace projections). It assumes that the label of the examples are integer starting from 1 to ncol(M).
Usage
Random.hclustering.validity(M, dim, pmethod = "RS", c = 3, hmethod = "average",
n = 50, scale = TRUE, seed = 100, AC=TRUE,
distance="euclidean")
Arguments
M |
matrix of data: rows are variables and columns are examples |
dim |
subspace dimension |
pmethod |
projection method. It must be one of the following: "RS" (random subspace projection) "PMO" (Plus Minus One random projection) "Norm" (normal random projection) "Achlioptas" (Achlioptas random projection) |
c |
number of clusters |
hmethod |
the agglomeration method to be used. This should be one of
"ward.D", "single", "complete", "average", "mcquitty", "median" or "centroid",
according to the |
n |
number of random projections |
scale |
if TRUE (default) the random projections are scaled |
seed |
numerical seed for the random generator |
AC |
if TRUE (default) the AC indices are computed. |
distance |
it must be one of the two: "euclidean" (default) or "pearson" (that is 1 - Pearson correlation). |
Value
a list with eight components: "validity", "overall.validity", "similarity.matrix", "dim", "cluster", "tree", "orig.tree", "orig.cluster":
validity |
a vector with the validity of each of the c clusters |
overall.validity |
validity index of the overall clustering |
similarity.matrix |
pairwise similarity matrix between examples |
dimension |
random projection dimension |
cluster |
list of the n clustering obtained by randomized hierarchical clustering |
tree |
list of the n trees obtained by the randomized hierarchical clustering |
orig.tree |
tree built in the original space |
orig.cluster |
list of the clusters in the original space |
AC |
matrix with the Assignment Confidence index for each example. Each row corresponds to an example, each column to a cluster (optional) |
Author(s)
Giorgio Valentini valentini@di.unimi.it
See Also
Achlioptas.random.projection
, Plus.Minus.One.random.projection
,
norm.random.projection
,random.subspace
,
Cluster.validity
, Validity.indices
, AC.index
Examples
# Assessment of the reliability of clusters discovered
# by hierarchical clustering using RS projections.
M <- generate.sample0(n=10, m=2, sigma=2, dim=800)
l<-Random.hclustering.validity(M, dim=30, pmethod = "RS", c = 3,
hmethod = "average", n = 20)
# The same as above, but using PMO projections.
l<-Random.hclustering.validity(M, dim=30, pmethod = "PMO", c = 3,
hmethod = "average", n = 20)
# The same as above, but evaluating clusterings with 5 clusters
l<-Random.hclustering.validity(M, dim=30, pmethod = "PMO", c = 5,
hmethod = "average", n = 20)
# The same as above, but evaluating clusterings with 10 clusters
l<-Random.hclustering.validity(M, dim=30, pmethod = "PMO", c = 10,
hmethod = "average", n = 20)
# Assessment of the reliability of the clusters using projections
# with limited distortion (max.
# expansion lower than 1.3 according to the Johnson Lindenstrauss lemma)
d <- JL.predict.dim(n=30, epsilon=0.3)
l<-Random.hclustering.validity(M, dim=d, pmethod = "PMO", c = 3,
hmethod = "average", n = 20)