mvout {mvout} | R Documentation |
Robust Multivariate Outlier Detection
Description
Detection of multivariate outliers using robust estimates of location and scale.
Usage
mvout(x, method = c("none", "princomp", "factanal"), standardize = TRUE,
robust = TRUE, direction = rep("two.sided", ncol(x)), thresh = 0.01,
keepx = TRUE, factors = 2, scores = c("regression", "Bartlett"),
rotation = c("none", "varimax", "promax"), ...)
Arguments
x |
Data matrix (n x p) |
method |
Character specifying the factorization method used to define the covariance matrix: "none" uses the unfactorized (robust) covariance matrix, "princomp" uses the (robust) principal components analysis (PCA) implied covariance matrix, and "factanal" uses the (robust) factor analysis (FA) implied covariance matrix. |
standardize |
Logical specifying whether to apply PCA to the correlation (default) or covariance matrix. Ignored if |
robust |
If |
direction |
Direction defining "outlier" for each variable (character). Three options are available: "two.sided" considers large postive and negative deviations from the mean as outliers, "less" only considers large negative deviations as outliers, and "greater" only considers large positve deviations as outliers. Accepts a single character giving the common direction for each variable, or a character vector of length p. |
thresh |
Scalar specifying the threshold for flagging outliers (0 < thresh < 1). See Note. |
keepx |
Logical indicating if input |
factors |
Integer giving the number of factors for PCA or FA model. Ignored if |
scores |
Method used to compute factor scores (only used if |
rotation |
Factor rotation method aapplied to PCA or FA loadings. Ignored if |
... |
Additional arguments passed to the |
Details
Outliers are determined using a (squared) Mahalanobis distance calculated using either the Minimum Covariance Determinant (MCD) estimator for the mean vector and covariance matrix (default) or the standard unbiased sample estimators. The MCD is computed using the covMcd
function. Includes options for specifying the direction of interest for outlier detection, as well as options for using bilinear models (PCA and FA) to define the covariance matrix used for the Mahalanobis distance.
Value
An object of class mvout
which is a list with the following components:
distance |
Numeric vector of (squared) Mahalanobis distances for the n observations. |
outlier |
Logical vector indicating whether or not each of the n observations is an outlier. |
mcd |
Object of class |
args |
List of input arguments (e.g., x, method, standardize, etc.) |
scores |
Factor or principal component scores (will be |
loadings |
Factor or principal component loadings (will be |
uniquenesses |
Variables uniquenesses (will be |
invrot |
Inverse of the matrix that was used to rotate the loadings (will be |
cormat |
Factor or principal component score correlation matrix (will be |
Warning
The default behavior of the covMcd
function (and, consequently, the mvout
function) is for the MCD estimator to be computed from a random sample of 500 observations. The nsamp
argument of the covMcd
function can be used to control the number of samples or request a different method (e.g., nsamp = "deterministic").
Note
For observations included in the (robust) covariance calculation, the critical value that designates an observation as an outlier is defined as qchisq(1 - thresh, df = p)
.
For the excluded observations, the critical value is defined as qf(1 - thresh, df1 = p, df2 = n - p) * ((n - 1) * p / (n - p))
.
Author(s)
Jesus E. Delgado <delga220@umn.edu> Nathaniel E. Helwig <helwig@umn.edu>
References
Todorov, V., & Filzmoser, F. (2009). An Object-Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1-47.
See Also
predict.mvout
for obtaining predictions from mvout
objects.
Examples
# generate some data
n <- 200
p <- 2
set.seed(0)
x <- matrix(rnorm(n * p), n, p)
# thresh = 0.01
set.seed(1) # for reproducible MCD estimate
out1 <- mvout(x)
plot(out1)
# thresh = 0.05
set.seed(1) # for reproducible MCD estimate
out5 <- mvout(x, thresh = 0.05)
plot(out5)
# direction = "greater"
set.seed(1) # for reproducible MCD estimate
out <- mvout(x, direction = "greater", thresh = 0.05)
plot(out)
# direction = c("greater", "less")
set.seed(1) # for reproducible MCD estimate
out <- mvout(x, direction = c("greater", "less"), thresh = 0.05)
plot(out)