impSeqRob {rrcovNA} | R Documentation |
Robust sequential imputation of missing values
Description
Impute missing multivariate data using robust sequential algorithm
Usage
impSeqRob(x, alpha=0.9, norm_impute=FALSE, check_data=FALSE, verbose=TRUE)
Arguments
x |
the original incomplete data matrix. |
alpha |
the number of regular genes, |
norm_impute |
If there are not enough complete observations and |
check_data |
whether to check the variables: only numeric, non discrete,
with less than 50% NAs and with non-zero MAD. The default is |
verbose |
whether to write messages about the checking of the data. By default
|
Details
The nonrobust version SEQimpute
starts from a complete subset of the data set Xc
and estimates
sequentially the missing values in an incomplete observation,
say x*, by minimizing the determinant of the covariance of the augmented
data matrix X* = [Xc; x']. Then the observation x* is added to the complete data matrix
and the algorithm continues with the next observation with missing values.
Since SEQimpute
uses the sample mean and covariance matrix it will be vulnerable
to the influence of outliers and it is improved by plugging in robust estimators of
location and scatter. One possible solution is to use the outlyingness measure as proposed
by Stahel (1981) and Donoho (1982) and successfully used for outlier
identification in Hubert et al. (2005). We can compute the outlyingness measure for
the complete observations only but once an incomplete observation is imputed (sequentially)
we could compute the outlyingness measure for it too and use it to decide if this observation
is an outlier or not. If the outlyingness measure does not exceed a predefined threshold
the observation is included in the further steps of the algorithm.
Value
A list containing the following elements:
x |
a matrix of the same form as |
outl |
outlyingness computed for all observations. |
flag |
flag of outliers. |
colInAnalysis |
the column indices of the columns used in the analysis. |
namesNotNumeric |
the names of the variables which are not numeric. |
namesNAcol |
names of the columns left out due to too many NA's. |
namesDiscrete |
names of the discrete variables. |
namesZeroScale |
names of the variables with zero scale. |
References
S. Verboven, K. Vanden Branden and P. Goos (2007). Sequential imputation for missing values. Computational Biology and Chemistry, 31, 320–327.
K. Vanden Branden and S. Verboven (2009). Robust Data Imputation. Computational Biology and Chemistry, 33, 7–13.
Examples
data(bush10)
impSeqRob(bush10) # impute squentially missing data