quantify.outliers {OutSeekR}R Documentation

Compute quantities for outlier detection

Description

Compute quantities for use in the detection of outliers. Specifically, compute z-scores based on the mean / standard deviation, the trimmed mean / trimmed standard deviation, or the median / median absolute deviation, or the cluster assignment from k-means with two clusters.

Usage

quantify.outliers(
  x,
  method = "mean",
  trim = 0,
  nstart = 1,
  exclude.zero = FALSE
)

Arguments

x

A numeric vector.

method

A string indicating the quantities to be computed. Possible values are

  • 'mean' : z-scores based on mean and standard deviation or trimmed mean and trimmed standard deviation if trim > 0,

  • 'median' : z-scores based on median and median absolute deviation, or

  • 'kmeans' : cluster assignment from k-means with two clusters. The default is z-scores based on the mean and standard deviation.

trim

A number, the fraction of observations to be trimmed from each end of x. Default is no trimming.

nstart

A number, for k-means clustering, the number of random initial centers for the clusters. Default is 1. See stats::kmeans() for further information.

exclude.zero

A logical, whether zeros should be excluded (TRUE) or not excluded (FALSE, the default) from computations. For method = 'mean' and method = 'median', this means zeros will not be included in computing the summary statistics; for method = 'kmeans', this means zeros will be placed in their own cluster, coded 0.

Value

A numeric vector the same size as x whose values are the requested quantities computed on the corresponding elements of x.

Examples

# Generate fake data.
set.seed(1234);
x <- rgamma(
    n = 20,
    shape = 2,
    scale = 2
    );
# Add missing values and zeros for demonstration.  Missing values are
# ignored, and zeros can be ignored with `exclude.zeros = TRUE`.
x[1:5] <- NA;
x[6:10] <- 0;

# Compute z-scores based on mean and standard deviation.
quantify.outliers(
    x = x,
    method = 'mean',
    trim = 0
    );
# Exclude zeros from the calculation of the mean and standard
# deviation.
quantify.outliers(
    x = x,
    method = 'mean',
    trim = 0,
    exclude.zero = TRUE
    );

# Compute z-scores based on the 5% trimmed mean and 5% trimmed
# standard deviation.
quantify.outliers(
    x = x,
    method = 'mean',
    trim = 0.05
    );

# Compute z-scores based on the median and median absolute deviation.
quantify.outliers(
    x = x,
    method = 'median'
    );

# Compute cluster assignments using k-means with k = 2.
quantify.outliers(
    x = x,
    method = 'kmeans'
    );
# Try different initial cluster assignments.
quantify.outliers(
    x = x,
    method = 'kmeans',
    nstart = 10
    );
# Assign zeros to their own cluster.
quantify.outliers(
    x = x,
    method = 'kmeans',
    exclude.zero = TRUE
    );

[Package OutSeekR version 1.0.0 Index]