predict.BayesECM {ezECM} | R Documentation |
New Event Categorization With Bayesian Inference
Description
New Event Categorization With Bayesian Inference
Usage
## S3 method for class 'BayesECM'
predict(object, Ytilde, thinning = 1, mixture_weights = "training", ...)
Arguments
object |
an object of |
Ytilde |
|
thinning |
integer, scalar. Values greater than one can be provided to reduce computation time. See details. |
mixture_weights |
character string describing the weights of the distributions in the mixture to be used for prediction. The default, |
... |
not used |
Details
The data in Ytilde
should be the p-values \in (0,1]
. The transformation applied to the data used to generate object
is automatically applied to Ytilde
within the predict.BayesECM()
function.
For a given event with an unknown category, a Bayesian ECM model seeks to predict the expected value of the latent variable \tilde{\mathbf{z}}_K
, where \tilde{\mathbf{z}}_K
is a vector of the length K
, and K
is the number of event categories. A single observation of \tilde{\mathbf{z}}_K
is a draw from a Categorical Distribution.
The expected probabilities stipulated within the categorical distribution of \tilde{\mathbf{z}}_K
are conditioned on any imputed missing data, prior hyperparameters, and individually each row of Ytilde
. The output from predict.BayesECM()
are draws from the distribution of \mathbf{E}[\tilde{\mathbf{z}}_K|\tilde{\mathbf{y}}_{\tilde{p}}, \mathbf{Y}^{+}, \mathbf{\eta}, \mathbf{\Psi}, \mathbf{\nu}, \mathbf{\alpha}] = p(\tilde{\mathbf{z}}_K|\tilde{\mathbf{y}}_{\tilde{p}}, \mathbf{Y}^{+}, \mathbf{\eta}, \mathbf{\Psi}, \mathbf{\nu}, \mathbf{\alpha})
, where \mathbf{Y}^{+}
represents the observed values within the training data.
The argument mixture_weights
controls the value of p(\tilde{\mathbf{z}}_K|\mathbf{Y}_{N \times p}, \mathbf{\alpha})
, the probability of each \tilde{z}_k = 1
, before \tilde{\mathbf{y}}_{\tilde{p}}
is observed. The standard result is obtained from the prior hyperparameter values in \mathbf{\alpha}
and the number of unique events in each \mathbf{Y}_{N_k \times p}
. Setting mixture_weights = "training"
will utilize this standard result in prediction. If the frequency of the number events used for each category in training is thought to be problematic, providing the argument mixture_weights = "equal"
sets p(\tilde{z}_1 = 1|\mathbf{Y}_{N \times p}) = \dots = p(\tilde{z}_K = 1|\mathbf{Y}_{N \times p}) = 1/K
. If the user wants to use a set of p(\tilde{z}_k = 1|\mathbf{Y}_{N \times p})
which are not equal but also not informed by the data, we suggest setting the elements of the hyperparameter vector \mathbf{\alpha}
equal to values with a large magnitude and in the desired ratios for each category. However, this can cause undesirable results in prediction if the magnitude of some elements of \mathbf{\alpha}
are orders larger than others.
To save computation time, the user can specify an integer value for thinning
greater than one. Every thinning
th Markov-chain Monte-Carlo sample is used for prediction. This lets the user take a large number of samples during the training step, allowing for better mixing. See details in a package vignette by running vignette("syn-data-code", package = "ezECM")
Value
Returns a list
. The list element epz
is a matrix with nrow(Ytilde)
rows, corresponding to each event used for prediction, and K
named columns. Each column of epz
is the expected category probability of the row stipulated event. The remainder of the list elements hold data including Ytilde
, information about additonal variables passed to predict.BayesECM
, and data related to the previous BayesECM()
fit.
Examples
csv_use <- "good_training.csv"
file_path <- system.file("extdata", csv_use, package = "ezECM")
training_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE)
trained_model <- BayesECM(Y = training_data, BT = c(10,1000))
csv_use <- "good_newdata.csv"
file_path <- system.file("extdata", csv_use, package = "ezECM")
new_data <- import_pvals(file = file_path, header = TRUE, sep = ",", training = TRUE)
bayes_pred <- predict(trained_model, Ytilde = new_data)