data_org {bmabart} | R Documentation |
Prepare Variables for Bayesian Mediation Analysis with BART
Description
Read in exposure, mediators, outcome, and covariates, and transform them into formats fit for BART fitting.
Usage
data_org(pred, m, y, refy = rep(NA, ncol(data.frame(y))),
predref = rep(NA, ncol(data.frame(pred))), deltap = NA,
deltam = NA, mref = rep(NA, ncol(data.frame(m))), cova = NULL,
cova.ref = list(), mcov = NULL, mcov.ref = list(), mclist = NULL,
complete = FALSE)
Arguments
pred |
The vector/matrix of the exposure/predict variable(s). |
m |
The dataframe of all potential mediators |
y |
The vector/matrix of the outcome(s). |
refy |
The reference groups of y when the corresponding outcome is binary or categorical. |
predref |
The reference groups of pred when the corresponding outcome is binary or categorical. |
deltap |
A vector of the length of the number of exposures. The difference in pred when calculate the changing rate by pred. If not set, the difference is 1 for categorical predictor and one tenth of the standard deviaiton of the predictor if continuous. |
deltam |
A vector of the length of the number of mediators. The ith item is the difference in the ith mediator when calculate the changing rate by each mediator. If not set, the difference is 1 for categorical mediators and one tenth of the standard deviaiton of the mediator if continuous. |
mref |
The reference groups of mediators when the corresponding mediator is binary or categorical. |
cova |
The covariance data for y. |
cova.ref |
The reference group for the binary or categorical covariates in cova. |
mcov |
The covariance data for mediators. |
mcov.ref |
The reference group if the mcovs are categorical or binary. |
mclist |
If mclist is null but not mcov, mcov is applied to all mediators. If both mcov and mclist are not NULL, the first item of mclist lists all mediators that are using different mcov, the following items gives the mcov for the mediators in order, NA if no mcov to be used. e.g. mclist=list(c(1,2,4),l1=1,l2=NA,l4=c(1,3)), mediator 1, m[,1], use mcov[,1], 2 uses no covariates, 4 uses mcov[,c(1,3)], all other mediators use all. Can also replace variable names with column numbers in the mclist. |
complete |
complete=TRUE if only completed cases are used in analysis. |
Details
The function helps organize input data into formats readible to the BART package for building BART. It also recoganize the type of the response variable(s), so that different functions and methods will be used for the mediation effect inferences.
Value
Return the cleaned up dataset and organized by types, which is ready for the Bayesian Mediation Analysis.
N |
The total number of observations. |
y_type |
The format of the response variable(s): 1 for continuous, 2 binary, 3 categorical, and 4 time-to-event. It is the same length as the number of outcomes. |
y |
The original y with observations of missing data removed, if complete=T. |
y1 |
The outcome variables where binary or categorical variables are replaced with dummy design matrix. |
cova |
The covariates for y, where binary or categorical variables are replaced with dummy design matrix. |
npred |
The number of predictors/exposures, where a categorical exposure of k levels has k-1 dummy predictors. |
nm |
The number of original mediators, ncol(m). |
mcov |
Reformated mcov. |
mind |
If mcov is not NULL, mind is a matrix of (# of mediator)*ncol(mcov), cell (i,j) is the indicator of whether the jth column of mcov should be used for mediator i in m1. |
pred1 |
The original pred with observations of missing data removed, if complete=T. |
pred2 |
The pred1 with all categorical or binary variables are turned into dummis. |
binpred1 |
The column numbers of binary predictors in pred1. |
binpred2 |
The column numbers of binary predictors in pred2. |
catpred1 |
The column numbers of categorical predictors in pred1. |
catpred2 |
The column numbers of categorical predictors in pred2. |
contpred1 |
The column numbers of continuous predictors in pred1. |
contpred2 |
The column numbers of continuous predictors in pred2. |
m1 |
The original m with observations of missing data removed, if complete=T. |
m2 |
The m1 with all categorical or binary variables are turned into dummis. |
m3.1 |
The m2 with all continuous variables minus a deltam[i]/2, where i is the ith mediator. |
m3.2 |
The m2 with all continuous variables add a deltam[i]/2, where i is the ith mediator. |
p1 |
The number of continuous mediators. |
p2 |
The number of binary mediators. |
p3 |
The number of categorical mediators. |
binm1 |
The column number of binary mediators in m1. |
binm2 |
The column number of binary mediators in m2. |
catm1 |
The column number of categorical mediators in m1. |
catm2 |
A matrix with the number of rows the number of categorical meidators by the order of catm1. Each row has the start (first column) and end (second column) column numbers of the categorical variable's design matrix in m2. |
contm1 |
The column number of continuous mediators in m1. |
contm2 |
The column number of continuous mediators in m2. |
deltap |
A vector of the length of the number of exposures. The difference in pred when calculate the changing rate by pred. If not input, the difference is 1 for categorical predictor and one tenth of the standard deviaiton of the predictor if continuous. |
deltam |
A vector of the length of the number of mediators. The ith item is the difference in the ith mediator when calculate the changing rate by each mediator. If not set, the difference is 1 for categorical mediators and one tenth of the standard deviaiton of the mediator if continuous. |
Note
data_org is run within bma.bart function. Users do not have to run data_org separately.
Author(s)
Qingzhao Yu and Bin Li
References
Yu, Q., and Li, B. (2025) <doi:>. "Mediation Analysis with Bayesian Additive Regression Trees," submitted.
Examples
data("weight_behavior")
#binary predictor
try0= data_org(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:14)],
y=weight_behavior[,15], refy = 0, predref = "F")
#add covariate for mediators
try1= data_org(pred=weight_behavior[,3], m=weight_behavior[,c(2,4:13)],
mcov=weight_behavior[,14], mclist=append(list(var=1:10),rep(NA,10)),
#"sweater" is used as a cov for "excercises" only
y=weight_behavior[,15], refy = 0, predref = "F") #,complete=T
#multiple prdictor
try2= data_org(pred=weight_behavior[,4], m=weight_behavior[,c(2:3,5:14)],
y=weight_behavior[,15], refy = 0, predref = "OTHER")
try3= data_org(pred=weight_behavior[,c(1,4)], m=weight_behavior[,c(2:3,5:14)],
y=weight_behavior[,15], refy = 0, predref = "OTHER")
#continuous y
try4= data_org(pred=weight_behavior[,4], m=weight_behavior[,c(2:3,5:14)],
y=weight_behavior[,1], refy = 0, predref = "OTHER")
#categorical y
try5= data_org(pred=weight_behavior[,1], m=weight_behavior[,c(2:3,5:14)],
y=weight_behavior[,4], refy = "", predref = "OTHER")
#add covariates for y and for mediators
try6= data_org(pred=weight_behavior[,4], m=weight_behavior[,c(5:12)],
cova=weight_behavior[,2:3],mcov=weight_behavior[,13:14],
mclist=c(list(var=1:7),rep(NA,6),list(1)),
y=weight_behavior[,1], refy = 0, predref = "OTHER")
#time-to-event outcome
data(cgd1) #a dataset in the survival package
x=cgd1[,c(4:5,7:12)]
pred=cgd1[,6]
status<-ifelse(is.na(cgd1$etime1),0,1)
y=Surv(cgd1$futime,status)
#for continuous predictor
try7<-data_org(pred=pred,m=x,y=y)