IPCR {OPCreg} | R Documentation |
Incremental Principal Component Regression for Online Datasets
Description
The IPCR function implements an incremental Principal Component Regression (PCR) method designed to handle online datasets. It updates the principal components recursively as new data arrives, making it suitable for real-time data processing.
Usage
IPCR(data, eta, m, alpha)
Arguments
data |
A data frame where the first column is the response variable and the remaining columns are predictor variables. |
eta |
The proportion of the initial sample size used to initialize the principal components (0 < eta < 1). Default is 0.0035. |
m |
The number of principal components to retain. Default is 3. |
alpha |
The significance level used for calculating critical values. Default is 0.05. |
Details
The IPCR function performs the following steps:
1. Standardizes the predictor variables.
2. Initializes the principal components using the first n0 = round(eta * n)
samples.
3. Recursively updates the principal components as each new sample arrives.
4. Fits a linear regression model using the principal component scores.
5. Back-transforms the regression coefficients to the original scale.
This method is particularly useful for datasets where new observations are continuously added, and the model needs to be updated incrementally.
Value
A list containing the following elements:
Bhat |
The estimated regression coefficients, including the intercept. |
RMSE |
The Root Mean Square Error of the regression model. |
summary |
The summary of the linear regression model. |
yhat |
The predicted values of the response variable. |
See Also
lm
: For fitting linear models.
eigen
: For computing eigenvalues and eigenvectors.
Examples
## Not run:
set.seed(1234)
library(MASS)
n <- 2000
p <- 10
mu0 <- as.matrix(runif(p, 0))
sigma0 <- as.matrix(runif(p, 0, 10))
ro <- as.matrix(c(runif(round(p / 2), -1, -0.8), runif(p - round(p / 2), 0.8, 1)))
R0 <- ro %*% t(ro)
diag(R0) <- 1
Sigma0 <- sigma0 %*% t(sigma0) * R0
x <- mvrnorm(n, mu0, Sigma0)
colnames(x) <- paste("x", 1:p, sep = "")
e <- rnorm(n, 0, 1)
B <- sample(1:3, (p + 1), replace = TRUE)
en <- matrix(rep(1, n * 1), ncol = 1)
y <- cbind(en, x) %*% B + e
colnames(y) <- paste("y")
data <- data.frame(cbind(y, x))
result <- IPCR(data = data, m = 3, eta = 0.0035, alpha = 0.05)
print(result$Bhat)
print(result$yhat)
print(result$RMSE)
print(result$summary)
## End(Not run)