visstat_core {visStatistics} | R Documentation |
Automated Visualization of Statistical Hypothesis Testing
Description
visstat_core()
provides automated selection and visualization
of a statistical hypothesis test between a two vectors in
a given data.frame
named dataframe
based on the data's type,
distribution, sample size, and the
specified conf.level
.
varsample
and varfactor
are character
strings corresponding to the column names of the chosen vectors in dataframe
.
These vectors must be of type integer
, numeric
or factor
.
The automatically generated output figures
illustrate the selected statistical hypothesis test, display the main test
statistics, and include assumption checks and post hoc comparisons when
applicable. The primary test results are returned as a list object.
Usage
visstat_core(
dataframe,
varsample,
varfactor,
conf.level = 0.95,
numbers = TRUE,
minpercent = 0.05,
graphicsoutput = NULL,
plotName = NULL,
plotDirectory = getwd()
)
Arguments
dataframe |
|
varsample |
|
varfactor |
|
conf.level |
Confidence level |
numbers |
a logical indicating whether to show numbers in mosaic count plots. |
minpercent |
number between 0 and 1 indicating minimal fraction of total count data of a category to be displayed in mosaic count plots. |
graphicsoutput |
saves plot(s) of type "png", "jpg", "tiff" or "bmp"
in directory specified in |
plotName |
graphical output is stored following the naming convention
"plotName.graphicsoutput" in |
plotDirectory |
specifies directory, where generated plots are stored. Default is current working directory. |
Details
The decision logic for selecting a statistical test is described below.
For more details, please refer to the package's vignette("visstat_coreistics")
.
Throughout, data of class numeric
or integer
are referred to as
numeric, while data of class factor
are referred to as categorical.
The significance level alpha
is defined as one minus the confidence
level, given by the argument conf.level
. Assumptions of normality and
homoscedasticity are considered met when the corresponding test yields a
p-value greater than alpha = 1 - conf.level
.
The choice of statistical tests performed by visstat_core()
depends on
whether the data are numeric or categorical, the number of levels in the
categorical variable, the distribution of the data, and the chosen
conf.level
. The function prioritises interpretable visual output and
tests that remain valid under their assumptions, following the logic below:
(1) When the response is numerical and the predictor is categorical, tests of
central tendency are performed. If the predictor has two levels:
t.test()
is used if both groups have more than 30 observations (Lumley
et al. (2002) <doi:10.1146/annurev.publhealth.23.100901.140546>). For smaller
samples, normality is assessed using shapiro.test()
. If both groups
return p-values greater than alpha
, t.test()
is applied;
otherwise, wilcox.test()
is used.
For predictors with more than two levels, aov()
is initially fitted.
Residual normality is tested with shapiro.test()
and ad.test()
.
If p > alpha
for either test, normality is assumed. Homogeneity of
variance is tested with bartlett.test()
. If p > alpha
,
aov()
with TukeyHSD()
is used. If p <= alpha
,
oneway.test()
is applied with TukeyHSD()
. If residuals are not
normal, kruskal.test()
with pairwise.wilcox.test()
is used.
(2): When both the response and predictor are numerical, a linear model
lm()
is fitted, with residual diagnostics and a confidence band plot.
(3): When both variables are categorical, visstat_core()
uses
chisq.test()
or fisher.test()
depending on expected counts,
following Cochran's rule (Cochran (1954) <doi:10.2307/3001666>).
Implemented main tests:
t.test()
, wilcox.test()
, aov()
,
oneway.test()
, lm()
, kruskal.test()
,
fisher.test()
, chisq.test()
.
Implemented tests for assumptions:
Normality:
shapiro.test()
andad.test()
Heteroscedasticity:
bartlett.test()
Implemented post hoc tests:
-
TukeyHSD()
foraov()
andoneway.test()
-
pairwise.wilcox.test()
forkruskal.test()
Value
list
containing statistics of automatically selected test
meeting assumptions. All values are returned as invisible copies.
Values can be accessed by assigning a return value to visstat_core
.
See Also
See also the package's vignette
vignette("visStatistics")
for the overview,
and the accompanying webpage
https://shhschilling.github.io/visStatistics/.
Examples
# Welch Two Sample t-test (t.test())
visstat_core(mtcars, "mpg", "am")
## Wilcoxon rank sum test (wilcox.test())
grades_gender <- data.frame(
Sex = as.factor(c(rep("Girl", 20), rep("Boy", 20))),
Grade = c(
19.3, 18.1, 15.2, 18.3, 7.9, 6.2, 19.4,
20.3, 9.3, 11.3, 18.2, 17.5, 10.2, 20.1, 13.3, 17.2, 15.1, 16.2, 17.3,
16.5, 5.1, 15.3, 17.1, 14.8, 15.4, 14.4, 7.5, 15.5, 6.0, 17.4,
7.3, 14.3, 13.5, 8.0, 19.5, 13.4, 17.9, 17.7, 16.4, 15.6
)
)
visstat_core(grades_gender, "Grade", "Sex")
## Welch's oneway ANOVA not assuming equal variances (oneway.test())
anova_npk <- visstat_core(npk, "yield", "block")
anova_npk # prints summary of tests
## Kruskal-Wallis rank sum test (kruskal.test())
visstat_core(iris, "Petal.Width", "Species")
visstat_core(InsectSprays, "count", "spray")
## Linear regression (lm())
visstat_core(trees, "Girth", "Height", conf.level = 0.99)
## Pearson's Chi-squared test (chisq.test())
### Transform array to data.frame
HairEyeColorDataFrame <- counts_to_cases(as.data.frame(HairEyeColor))
visstat_core(HairEyeColorDataFrame, "Hair", "Eye")
## Fisher's exact test (fisher.test())
HairEyeColorMaleFisher <- HairEyeColor[, , 1]
### slicing out a 2 x2 contingency table
blackBrownHazelGreen <- HairEyeColorMaleFisher[1:2, 3:4]
blackBrownHazelGreen <- counts_to_cases(as.data.frame(blackBrownHazelGreen))
fisher_stats <- visstat_core(blackBrownHazelGreen, "Hair", "Eye")
fisher_stats # print out summary statistics
## Saving the graphical output in directory "plotDirectory"
## A) Saving graphical output of type "png" in temporary directory tempdir()
## with default naming convention:
visstat_core(blackBrownHazelGreen, "Hair", "Eye",
graphicsoutput = "png",
plotDirectory = tempdir()
)
## Remove graphical output from plotDirectory
file.remove(file.path(tempdir(), "chi_squared_or_fisher_Hair_Eye.png"))
file.remove(file.path(tempdir(), "mosaic_complete_Hair_Eye.png"))
## B) Specifying pdf as output type:
visstat_core(iris, "Petal.Width", "Species",
graphicsoutput = "pdf",
plotDirectory = tempdir()
)
## Remove graphical output from plotDirectory
file.remove(file.path(tempdir(), "kruskal_Petal_Width_Species.pdf"))
## C) Specifying "plotName" overwrites default naming convention
visstat_core(iris, "Petal.Width", "Species",
graphicsoutput = "pdf",
plotName = "kruskal_iris", plotDirectory = tempdir()
)
## Remove graphical output from plotDirectory
file.remove(file.path(tempdir(), "kruskal_iris.pdf"))