ipd-package {ipd}R Documentation

ipd: Inference on Predicted Data

Description

Performs valid statistical inference on predicted data (IPD) using recent methods, where for a subset of the data, the outcomes have been predicted by an algorithm. Provides a wrapper function with specified defaults for the type of model and method to be used for estimation and inference. Further provides methods for tidying and summarizing results. Salerno et al., (2024) doi:10.48550/arXiv.2410.09665.

The ipd package provides tools for statistical modeling and inference when a significant portion of the outcome data is predicted by AI/ML algorithms. It implements several state-of-the-art methods for inference on predicted data (IPD), offering a user-friendly interface to facilitate their use in real-world applications.

Details

This package is particularly useful in scenarios where predicted values (e.g., from machine learning models) are used as proxies for unobserved outcomes, which can introduce biases in estimation and inference. The ipd package integrates methods designed to address these challenges.

Features

Key Functions

Documentation

The package includes detailed documentation for each function, including usage examples. A vignette is also provided to guide users through common workflows and applications of the package.

References

For details on the statistical methods implemented in this package, please refer to the associated manuscripts at the following references:

Author(s)

Maintainer: Stephen Salerno ssalerno@fredhutch.org (ORCID) [copyright holder]

Authors:

See Also

Useful links:

Examples

#-- Generate Example Data

set.seed(12345)

dat <- simdat(n = c(300, 300, 300), effect = 1, sigma_Y = 1)

head(dat)

formula <- Y - f ~ X1

#-- PostPI Analytic Correction (Wang et al., 2020)

fit_postpi1 <- ipd(formula, method = "postpi_analytic", model = "ols",

    data = dat, label = "set_label")

#-- PostPI Bootstrap Correction (Wang et al., 2020)

nboot <- 200

fit_postpi2 <- ipd(formula, method = "postpi_boot", model = "ols",

    data = dat, label = "set_label", nboot = nboot)

#-- PPI (Angelopoulos et al., 2023)

fit_ppi <- ipd(formula, method = "ppi", model = "ols",

    data = dat, label = "set_label")

#-- PPI++ (Angelopoulos et al., 2023)

fit_plusplus <- ipd(formula, method = "ppi_plusplus", model = "ols",

    data = dat, label = "set_label")

#-- PSPA (Miao et al., 2023)

fit_pspa <- ipd(formula, method = "pspa", model = "ols",

    data = dat, label = "set_label")

#-- Print the Model

print(fit_postpi1)

#-- Summarize the Model

summ_fit_postpi1 <- summary(fit_postpi1)

#-- Print the Model Summary

print(summ_fit_postpi1)

#-- Tidy the Model Output

tidy(fit_postpi1)

#-- Get a One-Row Summary of the Model

glance(fit_postpi1)

#-- Augment the Original Data with Fitted Values and Residuals

augmented_df <- augment(fit_postpi1)

head(augmented_df)

[Package ipd version 0.1.4 Index]