LCA {exametrika} | R Documentation |
Latent Class Analysis
Description
Performs Latent Class Analysis (LCA) on binary response data using the Expectation-Maximization (EM) algorithm. LCA identifies unobserved (latent) subgroups of examinees with similar response patterns, and estimates both the class characteristics and individual membership probabilities.
Usage
LCA(U, ncls = 2, na = NULL, Z = NULL, w = NULL, maxiter = 100)
Arguments
U |
Either an object of class "exametrika" or raw data. When raw data is given,
it is converted to the exametrika class with the |
ncls |
Number of latent classes to identify (between 2 and 20). Default is 2. |
na |
Values to be treated as missing values. |
Z |
Missing indicator matrix of type matrix or data.frame. Values of 1 indicate observed responses, while 0 indicates missing data. |
w |
Item weight vector specifying the relative importance of each item. |
maxiter |
Maximum number of EM algorithm iterations. Default is 100. |
Details
Latent Class Analysis is a statistical method for identifying unobserved subgroups within a population based on observed response patterns. It assumes that examinees belong to one of several distinct latent classes, and that the probability of a correct response to each item depends on class membership.
The algorithm proceeds by:
Initializing class reference probabilities
Computing posterior class membership probabilities for each examinee (E-step)
Re-estimating class reference probabilities based on these memberships (M-step)
Iterating until convergence or reaching the maximum number of iterations
Unlike Item Response Theory (IRT), LCA treats latent variables as categorical rather than continuous, identifying distinct profiles rather than positions on a continuum.
Value
An object of class "exametrika" and "LCA" containing:
- testlength
Length of the test (number of items).
- nobs
Sample size (number of rows in the dataset).
- Nclass
Number of latent classes specified.
- N_Cycle
Number of EM algorithm iterations performed.
- TRP
Test Reference Profile vector showing expected scores for each latent class. Calculated as the column sum of the estimated class reference matrix.
- LCD
Latent Class Distribution vector showing the number of examinees assigned to each latent class.
- CMD
Class Membership Distribution vector showing the sum of membership probabilities for each latent class.
- Students
Class Membership Profile matrix showing the posterior probability of each examinee belonging to each latent class. The last column ("Estimate") indicates the most likely class assignment.
- IRP
Item Reference Profile matrix where each row represents an item and each column represents a latent class. Values indicate the probability of a correct response for members of that class.
- ItemFitIndices
Fit indices for each item. See also
ItemFit
.- TestFitIndices
Overall fit indices for the test. See also
TestFit
.
References
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215-231.
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin.
Examples
# Fit a Latent Class Analysis model with 5 classes to the sample dataset
result.LCA <- LCA(J15S500, ncls = 5)
# Display the first few rows of student class membership probabilities
head(result.LCA$Students)
# Plot Item Response Profiles (IRP) for items 1-6 in a 2x3 grid
# Shows probability of correct response for each item across classes
plot(result.LCA, type = "IRP", items = 1:6, nc = 2, nr = 3)
# Plot Class Membership Probabilities (CMP) for students 1-9 in a 3x3 grid
# Shows probability distribution of class membership for each student
plot(result.LCA, type = "CMP", students = 1:9, nc = 3, nr = 3)
# Plot Test Response Profile (TRP) showing expected scores for each class
plot(result.LCA, type = "TRP")
# Plot Latent Class Distribution (LCD) showing class sizes
plot(result.LCA, type = "LCD")
# Compare models with different numbers of classes
# (In practice, you might try more class counts)
lca2 <- LCA(J15S500, ncls = 2)
lca3 <- LCA(J15S500, ncls = 3)
lca4 <- LCA(J15S500, ncls = 4)
lca5 <- LCA(J15S500, ncls = 5)
# Compare BIC values to select optimal number of classes
# (Lower BIC indicates better fit)
data.frame(
Classes = 2:5,
BIC = c(
lca2$TestFitIndices$BIC,
lca3$TestFitIndices$BIC,
lca4$TestFitIndices$BIC,
lca5$TestFitIndices$BIC
)
)