engineerMetric {DataSimilarity} | R Documentation |
Engineer Metric
Description
The function implements the L_q
-engineer metric for comparing two multivariate distributions.
Usage
engineerMetric(X1, X2, type = "F", seed = NULL)
Arguments
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
type |
Character specifying the type of |
seed |
Random seed (default: NULL). A random seed will only be set if one is provided. Method is deterministic, seed is only set for consistency with other methods. |
Details
The engineer is a primary propability metric that is defined as
\text{EN}(X_1, X_2; q) = \left[ \sum_{i = 1}^{p} \left| \text{E}\left(X_{1i}\right) - \text{E}\left(X_{2i}\right)\right|^q\right]^{\min(q, 1/q)} \text{ with } q> 0,
where X_{1i}, X_{2i}
denote the i
th component of the p
-dimensional random vectors X_1\sim F_1
and X_2\sim F_2
.
In the implementation, expectations are estimated by column means of the respective datasets.
Value
An object of class htest
with the following components:
method |
Description of the test |
statistic |
Observed value of the test statistic |
data.name |
The dataset names |
method |
Description of the test |
alternative |
The alternative hypothesis |
Applicability
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
Note
The seed argument is only included for consistency with other methods. The result of the metric calculation is deteministic.
References
Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. John Wiley & Sons, Chichester.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. doi:10.1214/24-SS149
See Also
Examples
set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Calculate engineer metric
engineerMetric(X1, X2)