gTests {DataSimilarity} | R Documentation |
Graph-Based Tests
Description
Performs the edge-count two-sample tests for multivariate data implementated in g.tests
from the gTests package. This function is inteded to be used e.g. in comparison studies where all four graph-based tests need to be calculated at the same time. Since large parts of the calculation coincide, using this function should be faster than computing all four statistics individually.
Usage
gTests(X1, X2, dist.fun = stats::dist, graph.fun = MST,
n.perm = 0, dist.args = NULL, graph.args = NULL,
maxtype.kappa = 1.14, seed = NULL)
Arguments
X1 |
First dataset as matrix or data.frame |
X2 |
Second dataset as matrix or data.frame |
dist.fun |
Function for calculating a distance matrix on the pooled dataset (default: |
graph.fun |
Function for calculating a similarity graph using the distance matrix on the pooled sample (default: |
n.perm |
Number of permutations for permutation test (default: 0, asymptotic test is performed). |
dist.args |
Named list of further arguments passed to |
graph.args |
Named list of further arguments passed to |
maxtype.kappa |
Parameter |
seed |
Random seed (default: NULL). A random seed will only be set if one is provided. |
Details
The original, weighted, generalized and maxtype edge-count test are performed.
For n.perm = 0
, an asymptotic test using the asymptotic normal approximation of the null distribution is performed. For n.perm > 0
, a permutation test is performed.
This implementation is a wrapper function around the function g.tests
that modifies the in- and output of that function to match the other functions provided in this package. For more details see the g.tests
.
Value
A list with the following components:
statistic |
Observed values of the test statistics |
p.value |
Asymptotic or permutation p values |
alternative |
The alternative hypothesis |
method |
Description of the test |
data.name |
The dataset names |
Applicability
Target variable? | Numeric? | Categorical? | K-sample? |
No | Yes | No | No |
References
Friedman, J. H., and Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests. The Annals of Statistics, 7(4), 697-717.
Chen, H. and Friedman, J.H. (2017). A New Graph-Based Two-Sample Test for Multivariate and Object Data. Journal of the American Statistical Association, 112(517), 397-409. doi:10.1080/01621459.2016.1147356
Chen, H., Chen, X. and Su, Y. (2018). A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data. Journal of the American Statistical Association, 113(523), 1146-1155, doi:10.1080/01621459.2017.1307757
Zhang, J. and Chen, H. (2022). Graph-Based Two-Sample Tests for Data with Repeated Observations. Statistica Sinica 32, 391-415, doi:10.5705/ss.202019.0116.
Chen, H., and Zhang, J. (2017). gTests: Graph-Based Two-Sample Tests. R package version 0.2, https://CRAN.R-project.org/package=gTests.
Stolte, M., Kappenberg, F., Rahnenführer, J., Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Statist. Surv. 18, 163 - 298. doi:10.1214/24-SS149
See Also
FR
for the original edge-count test, CF
for the generalized edge-count test, CCS
for the weighted edge-count test, and ZC
for the maxtype edge-count test,
gTests_cat
, CCS_cat
, FR_cat
, CF_cat
, and ZC_cat
for versions of the tests for categorical data
Examples
set.seed(1234)
# Draw some data
X1 <- matrix(rnorm(1000), ncol = 10)
X2 <- matrix(rnorm(1000, mean = 0.5), ncol = 10)
# Perform edge-count tests
if(requireNamespace("gTests", quietly = TRUE)) {
gTests(X1, X2)
}