pinterval_parametric {pintervals} | R Documentation |
Parametric prediction intervals for continuous predictions
Description
This function computes parametric prediction intervals with a confidence level of 1-alpha for a vector of (continuous) predicted values using a user specified parametric distribution and parameters. The distribution can be any distribution available in R or a user defined distribution as long as a quantile function is available. The parameters should be estimated on calibration data. The prediction intervals are calculated as the quantiles of the distribution at the specified confidence level.
Usage
pinterval_parametric(
pred,
dist = c("norm", "lnorm", "pois", "nbinom", "gamma", "logis", "beta"),
pars = list(),
alpha = 0.1,
lower_bound = NULL,
upper_bound = NULL
)
Arguments
pred |
Vector of predicted values |
dist |
Distribution to use for the prediction intervals. Can be a character string matching any available distribution in R or a function representing a distribution. If a function is provided, it must be a quantile function (e.g. qnorm, qgamma, etc.) |
pars |
List of named parameters for the distribution for each prediction. See details for more information. |
alpha |
The confidence level for the prediction intervals. Must be a single numeric value between 0 and 1 |
lower_bound |
Optional minimum value for the prediction intervals. If not provided, the minimum (true) value of the calibration partition will be used |
upper_bound |
Optional maximum value for the prediction intervals. If not provided, the maximum (true) value of the calibration partition will be used |
Details
The distributions are not limited to the standard distributions available in R. Any distribution can be used as long as a quantile function is available. Users may create their own distribution functions and plug in the resulting quantile function or create compositie or mixture distributions using for instance the package 'mistr' and plug in the resulting quantile function.
The list of parameters should be constructed such that when the distribution function is called with the parameters, it returns a vector of the same length as the predictions. In most cases the parameters should ensure that the predicted value corresponds to the mean, median, or mode of the resulting distribution. Parameters relating to the prediction error should be estimated on calibration data. For example, if normal prediction intervals are desired, the mean parameter should be the predicted value and the standard deviation parameter should be the estimated standard deviation of the prediction errors in the calibration set. If the distribution is a negative binomial distribution with a fixed size parameter, the size parameter should be estimated on the calibration data and the mu parameter should be the predicted value.
Value
A tibble with the predicted values and the lower and upper bounds of the prediction intervals
Examples
library(dplyr)
library(tibble)
x1 <- runif(1000)
x2 <- runif(1000)
y <- rlnorm(1000, meanlog = x1 + x2, sdlog = 0.5)
df <- tibble(x1, x2, y)
df_train <- df %>% slice(1:500)
df_cal <- df %>% slice(501:750)
df_test <- df %>% slice(751:1000)
mod <- lm(log(y) ~ x1 + x2, data=df_train)
calib <- exp(predict(mod, newdata=df_cal))
calib_truth <- df_cal$y
pred_test <- exp(predict(mod, newdata=df_test))
# Normal prediction intervals
pinterval_parametric(pred = pred_test,
dist = 'norm',
pars = list(mean = pred_test,
sd = sqrt(mean((calib - calib_truth)^2))))
# Log-normal prediction intervals
pinterval_parametric(pred = pred_test,
dist = 'lnorm',
pars = list(meanlog = pred_test,
sdlog = sqrt(mean((log(calib) - log(calib_truth))^2))))