viralpreds {viralmodels} | R Documentation |
Predict Viral Load or CD4 Count using Many Models
Description
This function predicts viral load or CD4 count values based on multiple machine learning models using cross-validation. It allows users to specify two types of predictions: normal predictions on the full dataset or observation-by-observation (obs-by-obs) predictions.
Usage
viralpreds(output, semilla, data, prediction_type = "full")
Arguments
output |
A non-ranked viraltab output |
semilla |
An integer specifying the seed for random number generation to ensure reproducibility. |
data |
A data frame containing the predictors and the target variable. |
prediction_type |
A character string specifying the type of predictions to perform.
Use |
Value
A list containing two elements: predictions
(a vector of predicted values for the target variable)
and RMSE
(the root mean square error of the best model).
Examples
library(dplyr)
library(magrittr)
library(baguette)
library(kernlab)
library(kknn)
library(ranger)
library(rules)
library(glmnet)
# Define the function to impute values in the undetectable range
set.seed(123)
impute_undetectable <- function(column) {
ifelse(column <= 40,
rexp(sum(column <= 40), rate = 1/13) + 1,
column)
}
# Apply the function to all vl columns using purrr's map_dfc
library(viraldomain)
data("viral", package = "viraldomain")
viral_imputed <- viral %>%
mutate(across(starts_with("vl"), ~impute_undetectable(.x)))
traindata <- viral_imputed
target <- "cd_2022"
viralvars <- c("vl_2019", "vl_2021", "vl_2022")
logbase <- 10
pliegues <- 5
repeticiones <- 2
rejilla <- 2
semilla <- 123
viraltab(traindata, semilla, target, viralvars, logbase, pliegues,
repeticiones, rejilla, rank_output = FALSE) %>%
viralpreds(semilla, traindata, prediction_type = "full")