mainVEM {ppsbm}R Documentation

Adaptive VEM algorithm

Description

Principal adaptive VEM algorithm for histogram (with model selection) or for kernel method.

Usage

mainVEM(
  data,
  n,
  Qmin,
  Qmax = Qmin,
  directed = TRUE,
  sparse = FALSE,
  method = c("hist", "kernel"),
  init.tau = NULL,
  cores = 1,
  d_part = 5,
  n_perturb = 10,
  perc_perturb = 0.2,
  n_random = 0,
  nb.iter = 50,
  fix.iter = 10,
  epsilon = 1e-06,
  filename = NULL
)

Arguments

data

Data format depends on the estimation method used!!

  1. Data with hist method - List with 2 components:

    data$Time

    Positive real number. [0,data$Time] is the total time interval of observation

    data$Nijk

    Data matrix with counts per process N_{ij} and sub-intervals ; matrix of size N*K where N = n(n-1) or n(n-1)/2 is the number of possible node pairs in the graph and K = 2^{dmax} is the size of the finest partition in the histogram approach.

    Counts are pre-computed - Obtained through function statistics on data with second format and using a number of subintervals K as a power of 2.

  2. Data with kernel method - List with 3 components:

    data$time.seq

    Vector of the time points of the events (size M).

    data$type.seq

    Vector of the corresponding node pair indexes in format output by convertNodePair of the events (same size M).

    data$Time

    [0,data$Time] is the total time interval of observation

n

Total number of nodes, 1\le i \le n

Qmin

Minimum number of groups

Qmax

Maximum number of groups

directed

Boolean for directed (TRUE) or undirected (FALSE) case

sparse

Boolean for sparse (TRUE) or not sparse (FALSE) case

method

Either hist for histogram method or kernel for kernel method

Beware: hist is recommended (much faster). You can obtain smooth estimated intensities by using kernelIntensities on the output of the hist method.

init.tau

List of initial values of \tau - all tau's are matrices with size Q\times n (might be with different values of Q)

cores

Number of cores for parallel execution

If set to 1 it does sequential execution

Beware: parallelization with fork (multicore) : doesn't work on Windows!

d_part

Maximal level for finest partition of time interval [0,T] used for k-means initializations.

  • Algorithm takes partition up to depth 2^d with d=1,...,d_{part}

  • Explore partitions [0,T], [0,T/2], [T/2,T], ... [0,T/2^d], ...[(2^d-1)T/2^d,T]

  • Total number of partitions npart= 2^{(d_{part} +1)} -1

n_perturb

Number of different perturbations on k-means result

When Qmin < Qmax, number of perturbations on the result with Q-1 or Q+1 groups

perc_perturb

Percentage of labels that are to be perturbed (= randomly switched)

n_random

Number of completely random initial points. The total number of initializations for the VEM is npart*(1+n_{perturb}) +n_{random}

nb.iter

Number of iterations of the VEM algorithm

fix.iter

Maximum number of iterations of the fixed point into the VE step

epsilon

Threshold for the stopping criterion of VEM and fixed point iterations

filename

Name of the file where to save the results along the computation (increasing steps for Q, these are the longest).

The file will contain a list of 'best' results.

Details

The sparse version works only for the histogram approach.

Value

The function outputs a list of Qmax-Qmin+1 components. Each component is the solution obtained for a number of clusters Q, with Qmin\le Q \le Qmax and is a list of 8 elements:

References

DAUDIN, J.-J., PICARD, F. & ROBIN, S. (2008). A mixture model for random graphs. Statist. Comput. 18, 173–183.

DEMPSTER, A. P., LAIRD, N. M. & RUBIN, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38.

JORDAN, M., GHAHRAMANI, Z., JAAKKOLA, T. & SAUL, L. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233.

MATIAS, C., REBAFKA, T. & VILLERS, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika. 105(3): 665-680.

MATIAS, C. & ROBIN, S. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. & Surveys 47, 55–74.

Examples

# load data of a synthetic graph with 50 individuals and 3 clusters
n <- 20
Q <- 3

Time <- generated_Q3_n20$data$Time
data <- generated_Q3_n20$data
z <- generated_Q3_n20$z

step <- .001
x0 <- seq(0,Time,by=step)
intens <-  generated_Q3_n20$intens

# VEM-algo kernel
sol.kernel <- mainVEM(data,n,Q,directed=FALSE,method='kernel', d_part=0,
    n_perturb=0)[[1]]
# compute smooth intensity estimators
sol.kernel.intensities <- kernelIntensities(data,sol.kernel$tau,Q,n,directed=FALSE)
# eliminate label switching
intensities.kernel <- sortIntensities(sol.kernel.intensities,z,sol.kernel$tau,
    directed=FALSE)

# VEM-algo hist
# compute data matrix with precision d_max=3 (ie nb of parts K=2^{d_max}=8).
K <- 2^3
Nijk <- statistics(data,n,K,directed=FALSE)
sol.hist <- mainVEM(list(Nijk=Nijk,Time=Time),n,Q,directed=FALSE, method='hist',
    d_part=0,n_perturb=0,n_random=0)[[1]]
log.intensities.hist <- sortIntensities(sol.hist$logintensities.ql,z,sol.hist$tau,
     directed=FALSE)

# plot estimators
par(mfrow=c(2,3))
ind.ql <- 0
for (q in 1:Q){
  for (l in q:Q){
    ind.ql <- ind.ql + 1
    true.val <- intens[[ind.ql]]$intens(x0)
    values <- c(intensities.kernel[ind.ql,],exp(log.intensities.hist[ind.ql,]),true.val)
    plot(x0,true.val,type='l',xlab=paste0("(q,l)=(",q,",",l,")"),ylab='',
        ylim=c(0,max(values)+.1))
    lines(seq(0,1,by=1/K),c(exp(log.intensities.hist[ind.ql,]),
        exp(log.intensities.hist[ind.ql,K])),type='s',col=2,lty=2)
    lines(seq(0,1,by=.001),intensities.kernel[ind.ql,],col=4,lty=3)
  }
}


[Package ppsbm version 1.0.0 Index]