check_efa {HMDA} | R Documentation |
Check Exploratory Factor Analysis Suitability
Description
Checks if specified features in a dataframe meet criteria for performing exploratory factor analysis (EFA). This function verifies that each feature exists, is numeric, has sufficient variability, and does not have an excessive proportion of missing values. For multiple features, it also assesses the full rank of the correlation matrix and the level of intercorrelation among features.
Usage
check_efa(
df,
features,
min_unique = 5,
min_intercorrelation = 0.3,
verbose = FALSE
)
Arguments
df |
A dataframe containing the features. |
features |
A character vector of feature names to be evaluated. |
min_unique |
An integer specifying the minimum number of unique non-missing values required for a feature. Default is 5. |
min_intercorrelation |
A numeric threshold for the minimum acceptable intercorrelation among features. (Note: this parameter is not used explicitly in the current implementation.) Default is 0.3. |
verbose |
Logical; if |
Details
The function performs several checks:
- Existence
Verifies that each feature in
features
is present indf
.- Numeric Type
Checks that each feature is numeric.
- Variability
Ensures that each feature has at least
min_unique
unique non-missing values.- Missing Values
Flags features with more than 20% missing values.
If more than one feature is provided, the function computes the correlation matrix (using pairwise complete observations) and checks:
- Full Rank
Whether the correlation matrix is full rank. A rank lower than the number of features indicates redundancy.
- Intercorrelations
Identifies features that do not have any correlation (>= 0.4) with the other features.
Value
TRUE
if all features are deemed suitable for EFA, and FALSE
otherwise. In the latter case, messages detailing the issues are printed.
Author(s)
E. F. Haghish
Examples
# Example: assess feature suitability for EFA using the USJudgeRatings dataset.
# this dataset contains ratings on several aspects of U.S. federal judges' performance.
# Here, we check whether these rating variables are suitable for EFA.
data("USJudgeRatings")
features_to_check <- colnames(USJudgeRatings[,-1])
result <- check_efa(
df = USJudgeRatings,
features = features_to_check,
min_unique = 3,
verbose = TRUE
)
# TRUE indicates the features are suitable.
print(result)