gicf {gicf}R Documentation

Penalised maximum likelihood covariance matrix estimation

Description

Estimation of a sparse covariance matrix via the ridge-regularised covglasso estimator described in Cibinel et al. (2024).

Usage

gicf(
  data = NULL,
  S = NULL,
  n = NULL,
  lambda = 0,
  kappa = 0,
  max.iter = 2500,
  tol = 1e-04,
  Sigma.init = NULL,
  adj = NULL
)

Arguments

data

A numerical matrix whose rows contain the observations of multivariate normal random vector. If NULL, the sample covariance matrix S and the dataset size n must be provided.

S

The sample covariance matrix. Must be provided if data is NULL.

n

The dataset size. Must be provided if data is NULL.

lambda

A vector of non-negative lasso parameters. For efficency purposes, should be sorted from largest to smallest.

kappa

A non-negative ridge regularisation parameter.

max.iter

The maximum number of iterations allowed for the coordinate descent algorithm.

tol

A numerical tolerance below which quantities are treated as zero.

Sigma.init

The initial guess for the coordinate descent algorithm. Defaults to the diagonal of the sample covariance matrix.

adj

An optional matrix whose pattern of zeroes is enforced onto the final output of the algorithm.

Details

This function computes the ridge-regularised covglasso estimator of the covariance matrix of a multivariate normal distribution, that is it computes the maximum of the penalised log-likelihood

-\text{log}|\Sigma| - \text{trace}(\Sigma^{-1}S) - \lambda\|\Sigma - \text{diag}(\Sigma)\|_1 - \kappa\|\Sigma^{-1}\|_1,

where \lambda, \kappa \geq 0. The optimum is computed via a coordinate descent algorithm, resulting in an approach which unifies and extends the methods of Chaudhuri et. al (2007), Warton (2008), Bien and Tibshirani (2011) and Wang (2014).

Value

If a scalar value for lambda is provided, a list containing the following elements.

sigma The estimate of the covariance matrix.
omega The inverse of the estimated covariance matrix.
loglik The (unpenalised) log-likelihood at the optimum.
loglikpen The (penalised) log-likelihood at the optimum.
it The number of iterations needed to reach convergence.

If a vector of values of lambda is provided, the output is a list in which each entry is itself a list, structured as above, associated with the corresponding value of lambda.

References

Chaudhuri, S., M. Drton, and T. S. Richardson (2007). Estimation of a covariance matrix with zeros. Biometrika 94 (1), 199–216.

Cibinel, L., A. Roverato, and V. Vinciotti (2024). A unified approach to penalized likelihood estimation of covariance matrices in high dimensions. arXiv, arXiv:2410.02403.

Bien, J. and R. J. Tibshirani (2011). Sparse estimation of a covariance matrix. Biometrika 98 (4), 807–820.

Wang, H. (2014). Coordinate descent algorithm for covariance graphical lasso. Statistics and Computing 24, 521–529.

Warton, D. I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 103 (481), 340–349.

Examples

# An example with a banded covariance matrix
library(mvtnorm)

set.seed(1234)

p <- 10
n <- 500

# Create banded covariance matrix with three bands
band1 <- cbind(1:(p - 1), 2:p)
band2 <- cbind(1:(p - 2), 3:p)
band3 <- cbind(1:(p - 3), 4:p)
idxs <- rbind(band1, band2, band3)

Sigma <- matrix(0, p, p)
Sigma[idxs] <- 0.5
Sigma <- Sigma + t(Sigma)
diag(Sigma) <- 2

# Generate data
data <- rmvnorm(n, sigma = Sigma)

# Fit a path of estimates
lambdas <- seq(0, 0.15, 0.01)
fit <- gicf(data, lambda = lambdas, kappa = 0.1)

# Explore one particular estimate
onefit <- fit[[5]]
image(onefit$sigma != 0)

# Redo the fit, but this time fix the correct sparsity pattern
fit2 <- gicf(data, lambda = lambdas, kappa = 0.1, adj = Sigma)

onefit2 <- fit2[[5]]
image(onefit2$sigma != 0)

[Package gicf version 1.0 Index]