bayesHRT {HRTnomaly} | R Documentation |
Calculate Cellwise Flags for Anomaly Detection Using Bayesian Testing
Description
The function uses the predictive posterior distribution based on emprical likelihoods to determine if a data entry is an outlier on not.
The function takes a long-format data.frame
object as input and returns it with two appended vectors.
The first vector contains the posterior probabilities as numbers between zero and one, and the second vector provides
a set of logical values indicating whether the data entry is an outlier (TRUE
) or not (FALSE
).
Usage
bayesHRT(a, prior = NULL)
Arguments
a |
A long-format |
prior |
A numerical value or vector of cell-level prior probabilites of observing an outlier. It is |
Details
The argument a
is proivded as an object of class data.frame
.
This object is considered as a long-format data.frame
, and it must have at least five columns with the following names:
"strata"
a
character
orfactor
column containing the information on the stratification."unit_id"
a
character
orfactor
column containing the ID of the statistical unit in the survey sample(x, size, replace = FALSE, prob = NULL)."master_varname"
a
character
column containing the name of the observed variable."current_value_num"
a
numeric
the observed value, i.e., a data entrie"pred_value"
a
numeric
a value observed on a previous survey for the same variable if available. If not available, the value can be set toNA
orNaN
. When working with longitudinal data, the value can be set to a time-series forecast or a filtered value."prior"
a
numeric
a value of prior probabilities of observign an outlier for the cell. If this column is omitted in the dataset provided, the function will use the values provided through the argumentprior
.
The data.frame
object in input can have more columns, but the extra columns would be ignored in the analyses.
However, these extra columns would be preserved in the system memory and returned along with the results from the cellwise outlier-detection analysis.
The use of the R-packages dplyr
, purrr
, and tidyr
is highly recommended to simplify the conversion of datasets between long and wide formats.
Value
The long-format data.frame
is provided as input data and contains extra columns i.e., anomaly flags and outlier posterior predictive distribution.
Author(s)
Luca Sartore drwolf85@gmail.com
Examples
# Load the package
library(HRTnomaly)
set.seed(2025L)
# Load the 'toy' data
data(toy)
# Detect cellwise outliers
res <- bayesHRT(toy[sample.int(100), ], prior = 0.5)