FLXMCregmultinom {flexord} | R Documentation |
FlexMix Driver for Regularized Multinomial Mixtures
Description
This model driver can be used to cluster data using a multinomial distribution.
Usage
FLXMCregmultinom(formula = . ~ ., r = NULL, alpha = 0)
Arguments
formula |
A formula which is interpreted relative to the
formula specified in the call to |
r |
Number of different categories. Values are assumed to be
integers in |
alpha |
A non-negative scalar acting as regularization
parameter. Can be regarded as adding |
Details
Using a regularization parameter alpha
greater than zero
acts as adding alpha
observations conforming to the population
mean to each component. This can be used to avoid degenerate
solutions. It also has the effect
that clusters become more similar to each other the larger
alpha
is chosen. For small values it is mostly negligible however.
For regularization we compute the MAP estimates for the multinomial distribution using the Dirichlet distribution as prior, which is the conjugate prior. The parameters of this prior are selected to correspond to the marginal distribution of the variable across all observations.
Value
An object of class "FLXC"
.
References
Galindo Garre, F, Vermunt, JK (2006). Avoiding Boundary Estimates in Latent Class Analysis by Bayesian Posterior Mode Estimation Behaviormetrika, 33, 43-59. - Ernst, D, Ortega Menjivar, L, Scharl, T, GrĂ¼n, B (2025). Ordinal Clustering with the flex-Scheme. Austrian Journal of Statistics. Submitted manuscript.
Examples
library("flexmix")
library("flexord")
library("flexclust")
set.seed(0xdeaf)
# Sample data
k <- 4 # nr of clusters
nvar <- 10 # nr of variables
r <- sample(2:7, size=nvar, replace=TRUE) # nr of categories
N <- 100 # obs. per cluster
# random probabilities per component
probs <- lapply(seq_len(k), \(ki) runif(nvar, 0.01, 0.99))
# sample data by drawing from a binomial distribution with size = r - 1
# values are expect values to lie inside 1:r hence we add +1.
dat <- lapply(probs, \(p) {
mapply(\(p_i, r_i) {
rbinom(N, r_i, p_i) + 1
}, p, r-1, SIMPLIFY=FALSE) |> do.call(cbind, args=_)
}) |> do.call(rbind, args=_)
true_clusters <- rep(1:4, rep(N, k))
# Cluster without regularization
m1 <- stepFlexmix(dat~1, model=FLXMCregmultinom(r=r, alpha=0), k=k)
# Cluster with regularization
m2 <- stepFlexmix(dat~1, model=FLXMCregmultinom(r=r, alpha=1), k=k)
# Both models are mostly able to reconstruct the true clusters (ARI ~ 0.95)
# (it's a very easy clustering problem)
# Small values for the regularization don't seem to affect the ARI (much)
randIndex(clusters(m1), true_clusters)
randIndex(clusters(m2), true_clusters)