calc_normalized_entropy {qtkit} | R Documentation |
Calculate Normalized Entropy for Categorical Variables
Description
Computes the normalized entropy (uncertainty measure) for categorical variables, providing a standardized measure of dispersion or randomness in the data.
Usage
calc_normalized_entropy(x)
Arguments
x |
A character vector or factor containing categorical data. |
Details
The function:
Handles both character vectors and factors as input
Treats NA values as a separate category
Normalizes entropy to range (0,1) where:
0 indicates complete certainty (one category dominates)
1 indicates maximum uncertainty (equal distribution)
The calculation process:
Computes category proportions
Calculates raw entropy using Shannon's formula
Normalizes by dividing by maximum possible entropy
Value
A numeric value between 0 and 1 representing the normalized entropy:
Values closer to 0 indicate less diversity/uncertainty
Values closer to 1 indicate more diversity/uncertainty
Examples
# Calculate entropy for a simple categorical vector
x <- c("A", "B", "B", "C", "C", "C", "D", "D", "D", "D")
calc_normalized_entropy(x)
# Handle missing values
y <- c("A", "B", NA, "C", "C", NA, "D", "D")
calc_normalized_entropy(y)
# Works with factors too
z <- factor(c("Low", "Med", "Med", "High", "High", "High"))
calc_normalized_entropy(z)