FactorHet_mbo_control {FactorHet}R Documentation

Control for model-based optimization

Description

FactorHet_mbo_control is used to adjust the settings for the MBO (model-based optimization). All arguments have default values. This relies heavily on options from the mlrMBO package so please see this package for more detailed discussion.

Usage

FactorHet_mbo_control(
  mbo_type = c("sparse", "ridge"),
  mbo_initialize = "mm_mclust_prob",
  mm_init_iterations = NULL,
  mbo_range = c(-5, 0),
  mbo_method = "regr.bgp",
  final_method = "best.predicted",
  iters = 11,
  mbo_noisy = TRUE,
  criterion = c("BIC", "AIC", "GCV", "BIC_group"),
  ic_method = c("EM", "IRLS", "free_param"),
  se_final = TRUE,
  mbo_design = -1.5,
  fast_estimation = NULL,
  verbose = FALSE
)

Arguments

mbo_type

A character argument indicating the type of model to estimate. The default is "sparse" which uses the structured sparse penalty discussed in Goplerud et al. (2025) and discussed in FactorHet. "ridge" performs a ridge regression.

mbo_initialize

An argument for the initialization method for each MBO proposal. The default is "mm_mclust_prob". "Details" provides a more in-depth discussion.

mm_init_iterations

An integer value of the number of iterations to use if Murphy/Murphy initialization is used. The default is NULL which uses default values of 100 if probabilistic and 50 if deterministic. "Details" provides a more in-depth discussion.

mbo_range

A vector of numerical values that set the range of values to consider on log10(lambda), before standardization (e.g., scaling by N, see FactorHet_control. The default is c(-5,0). "Details" provides more information.

mbo_method

A function used to propose new values of the regularization parameters. See information from mlr for more details. The default is "regr.bgp" which requires the tgp package to be installed.

final_method

A character argument that determines how the final regularization parameter should be selected. The default is "best_predicted" that uses the regularization parameter that is predicted to have the best value of the criterion. Other options are described in detail in makeMBOControl for final.method. Alternative options include "last.proposed" and "best.true.y".

iters

A non-negative integer value of the number of proposals to do after initialization. The default is 11.

mbo_noisy

A logical value for whether to treat the objective function as "noisy" for purposes of model-based optimization. The default is TRUE. The "noisy_optimization" vignette from mlrMBO provides more details. The criterion function is not, in fact, noisy but this option often performs better for a non-smooth function. It uses link[mlrMBO]{crit.eqi} instead of link[mlrMBO]{crit.ei}.

criterion

A character value of the criterion to minimize. Options are "BIC" (default), "AIC", "GCV", or "BIC_group". "BIC_group" counts the number of observations as the number of individuals (e.g., in the case of repeated observations per person).

ic_method

A character value for the method for calculating degrees of freedom: "EM" (default), "IRLS", and "free_param". See FactorHet_control for more information.

se_final

A logical value for whether standard errors be calculated for the final model. The default value is TRUE.

mbo_design

An argument for how to design the initial proposals for MBO. The default is -1.5; this and other options are described in "Details".

fast_estimation

An argument as to whether a weaker convergence criterion should be used for MBO. The default is NULL which uses the same arguments for all models. "Details" provides more information.

verbose

A logical argument to provide more information on the initial steps for MBO; the default is FALSE.

Details

Initialization: FactorHet_mbo relies on the same initialization for each attempt. The default procedure ("mm_mclust_prob") is discussed in detail in the appendix of Goplerud et al. (2025) and builds on Murphy and Murphy (2020). In brief, it deterministically initializes group memberships using only the moderators (e.g. using "mclust"). Using those memberships, it uses an EM algorithm (with probabilistic assignment, if "prob" is specified, or hard assignment otherwise) for a few steps with only the main effects to update the proposed group memberships. If the warning appears that "Murphy/Murphy initialization did not fully converge" , this mean that this initial step did not fully converge. The number of iterations could be increased using mm_init_iterations if desired, although benefits are usually modest beyond the default settings. These memberships are then used to initialize the model at each proposed regularization value.

The options available are "spectral" and "mclust" that use "spectral" or "mclust" on the moderators with no Murphy/Murphy style tuning. Alternatively, "mm_mclust" and "mm_spectral" combine the Murphy/Murphy tuning upon the corresponding initial deterministic initialization (e.g. spectral or "mclust"). These use hard assignment at each step and likely will converge more quickly although a hard initial assignment may not be desirable. Adding the suffix "_prob" to the "mm_*" options uses a standard (soft-assignment) EM algorithm during the Murphy/Murphy tuning.

If one wishes to use a custom initialization for MBO, then set mbo_initialize=NULL and provide an initialization via FactorHet_control. It is strongly advised to use a deterministic initialization if done manually, e.g. by providing a list of initial assignment probabilities for each group.

Design of MBO Proposals: The MBO procedure works as follows; there are some initial proposals that are evaluated in terms of the criterion. Given those initial proposals, there are iters attempts to improve the criterion through methods described in detail in mlrMBO (Bischl et al. 2018). A default of 11 seems to work well, though one can examine visualize_MBO after estimation to see how the criterion varied across the proposals.

By default, the regularization parameter is assumed to run from -5 to 0 on the log10 scale, before standardizing by the size of the dataset. We found this to be reasonable, but it can be adjusted using mbo_range.

It is possible to calibrate the initial proposals to help the algorithm find a minimum of the criterion more quickly. This is controlled by mbo_design which accepts the following options. Note that a manual grid search can be provided using the data.frame option below.

Scalar:

By default, this is initialized with a scalar (-1.5) that is the log10 of lambda, before standardization as discussed in FactorHet_control. For a scalar value, four proposals are generated that start with the scalar value and adjust it based on the level of sparsity of the initial estimated model. This attempts to avoid initializations that are too dense and thus are very slow to estimate, as well as ones that are too sparse.

"random":

If the string "random" is provided, this follows the default settings in mlrMBO and generates random proposals.

data.frame:

A custom grid can be provided using a data.frame that has two columns ("l" and "y"). "l" provides the proposed values on the log10 lambda scale (before standardization). If the corresponding BIC value is known, e.g. from a prior run of the algorithm, the column "y" should contain this value. If it is unknown, leave the value as NA and the value will be estimated. Thus, if a manual grid search is desired, this can be done as follows. Create a data.frame with the grid values "l" and all "y" as NA. Then, set iters = 0 to do no estimation after the grid search.

Estimation: Typically, estimation proceeds using the same settings for each MBO proposal and the final model estimated given the best regularization value (see option final_method for details). However, if one wishes to use a lower convergence criterion for the MBO proposals to speed estimation, this can be done using the fast_estimation option. This proceeds by giving a named list with two members "final" and "fast". Each of these should be a list with two elements "tolerance.logposterior" and "tolerance.parameters" with the corresponding convergence thresholds. "final" is used for the final model and "fast" is used for evaluating all of the MBO proposals.

Value

FactorHet_mbo_control returns a named list containing the elements listed in "Arguments".

References

Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas and Michel Lang. 2018. "mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions." arxiv preprint: https://arxiv.org/abs/1703.03373

Goplerud, Max, Kosuke Imai, and Nicole E. Pashley. 2025. "Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis." arxiv preprint: https://arxiv.org/abs/2201.01357

Murphy, Keefe and Thomas Brendan Murphy. 2020. "Gaussian Parsimonious Clustering Models with Covariates and a Noise Component." Advances in Data Analysis and Classification 14:293– 325.

Examples

str(FactorHet_mbo_control())

[Package FactorHet version 1.0.0 Index]