mvout {mvout}R Documentation

Robust Multivariate Outlier Detection

Description

Detection of multivariate outliers using robust estimates of location and scale.

Usage

mvout(x, method = c("none", "princomp", "factanal"), standardize = TRUE,
      robust = TRUE, direction = rep("two.sided", ncol(x)), thresh = 0.01, 
      keepx = TRUE, factors = 2, scores = c("regression", "Bartlett"), 
      rotation = c("none", "varimax", "promax"), ...)

Arguments

x

Data matrix (n x p)

method

Character specifying the factorization method used to define the covariance matrix: "none" uses the unfactorized (robust) covariance matrix, "princomp" uses the (robust) principal components analysis (PCA) implied covariance matrix, and "factanal" uses the (robust) factor analysis (FA) implied covariance matrix.

standardize

Logical specifying whether to apply PCA to the correlation (default) or covariance matrix. Ignored if method = "none" or method = "factanal".

robust

If TRUE (default), robust estimates of the mean vector and covariance matrix are obtained using the covMcd function. Otherwise standard estimators are obtained using the colMeans and cov functions.

direction

Direction defining "outlier" for each variable (character). Three options are available: "two.sided" considers large postive and negative deviations from the mean as outliers, "less" only considers large negative deviations as outliers, and "greater" only considers large positve deviations as outliers. Accepts a single character giving the common direction for each variable, or a character vector of length p.

thresh

Scalar specifying the threshold for flagging outliers (0 < thresh < 1). See Note.

keepx

Logical indicating if input x should be saved and returned as part of the output.

factors

Integer giving the number of factors for PCA or FA model. Ignored if method = "none".

scores

Method used to compute factor scores (only used if method = "factanal").

rotation

Factor rotation method aapplied to PCA or FA loadings. Ignored if method = "none".

...

Additional arguments passed to the covMcd function, e.g., alpha, nsamp, etc. Note that the cor argument should not be used, as this is controlled by the standardize argument.

Details

Outliers are determined using a (squared) Mahalanobis distance calculated using either the Minimum Covariance Determinant (MCD) estimator for the mean vector and covariance matrix (default) or the standard unbiased sample estimators. The MCD is computed using the covMcd function. Includes options for specifying the direction of interest for outlier detection, as well as options for using bilinear models (PCA and FA) to define the covariance matrix used for the Mahalanobis distance.

Value

An object of class mvout which is a list with the following components:

distance

Numeric vector of (squared) Mahalanobis distances for the n observations.

outlier

Logical vector indicating whether or not each of the n observations is an outlier.

mcd

Object of class mcd that is output from the covMcd function.

args

List of input arguments (e.g., x, method, standardize, etc.)

scores

Factor or principal component scores (will be NULL if method = "none").

loadings

Factor or principal component loadings (will be NULL if method = "none").

uniquenesses

Variables uniquenesses (will be NULL if method = "none").

invrot

Inverse of the matrix that was used to rotate the loadings (will be NULL if method = "none").

cormat

Factor or principal component score correlation matrix (will be NULL if method = "none").

Warning

The default behavior of the covMcd function (and, consequently, the mvout function) is for the MCD estimator to be computed from a random sample of 500 observations. The nsamp argument of the covMcd function can be used to control the number of samples or request a different method (e.g., nsamp = "deterministic").

Note

For observations included in the (robust) covariance calculation, the critical value that designates an observation as an outlier is defined as qchisq(1 - thresh, df = p).

For the excluded observations, the critical value is defined as qf(1 - thresh, df1 = p, df2 = n - p) * ((n - 1) * p / (n - p)).

Author(s)

Jesus E. Delgado <delga220@umn.edu> Nathaniel E. Helwig <helwig@umn.edu>

References

Todorov, V., & Filzmoser, F. (2009). An Object-Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1-47.

See Also

predict.mvout for obtaining predictions from mvout objects.

Examples

# generate some data
n <- 200
p <- 2
set.seed(0)
x <- matrix(rnorm(n * p), n, p)

# thresh = 0.01
set.seed(1)    # for reproducible MCD estimate
out1 <- mvout(x)
plot(out1)

# thresh = 0.05
set.seed(1)    # for reproducible MCD estimate
out5 <- mvout(x, thresh = 0.05)
plot(out5)

# direction = "greater"
set.seed(1)    # for reproducible MCD estimate
out <- mvout(x, direction = "greater", thresh = 0.05)
plot(out)

# direction = c("greater", "less")
set.seed(1)    # for reproducible MCD estimate
out <- mvout(x, direction = c("greater", "less"), thresh = 0.05)
plot(out)


[Package mvout version 1.2 Index]