determine_factors {TVMVP} | R Documentation |
Determine the Optimal Number of Factors via an Information Criterion
Description
This function selects the optimal number of factors for a local principal component analysis (PCA) model of asset returns. It computes an BIC-type information criterion (IC) for each candidate number of factors, based on the sum of squared residuals (SSR) from the PCA reconstruction and a penalty term that increases with the number of factors. The optimal number of factors is chosen as the one that minimizes the IC. The procedure is available either as a stand-alone function or as a method in the 'TVMVP' R6 class.
Usage
determine_factors(returns, max_m, bandwidth = silverman(returns))
Arguments
returns |
A numeric matrix of asset returns with dimensions |
max_m |
Integer. The maximum number of factors to consider. |
bandwidth |
Numeric. Kernel bandwidth for local PCA. Default is Silverman's rule of thumb. |
Details
Two usage styles:
# Function interface determine_factors(returns, max_m = 5) # R6 method interface tv <- TVMVP$new() tv$set_data(returns) tv$determine_factors(max_m = 5) tv$get_optimal_m() tv$get_IC_values()
When using the method form, if 'max_m' or 'bandwidth' are omitted, they default to values stored in the object. Results are cached and retrievable via class methods.
For each candidate number of factors m
(from 1 to max_m
), the function:
Performs a local PCA on the returns at each time point
r = 1,\dots,T
usingm
factors.Computes a reconstruction of the returns and the corresponding residuals:
\text{Residual}_r = R_r - F_r \Lambda_r,
where
R_r
is the return at timer
, andF_r
and\Lambda_r
are the local factors and loadings, respectively.Computes the average sum of squared residuals (SSR) as:
V(m) = \frac{1}{pT} \sum_{r=1}^{T} \| \text{Residual}_r \|^2.
Adds a penalty term that increases with
R
:\text{Penalty}(m) = m × \frac{(p + T × \text{bandwidth})}{(pT × \text{bandwidth})} \log\left(\frac{pT × \text{bandwidth}}{(p + T × \text{bandwidth})}\right).
The information criterion is defined as:
\text{IC}(m) = \log\big(V(m)\big) + \text{Penalty}(m).
The optimal number of factors is then chosen as the value of m
that minimizes \text{IC}(m)
.
Value
A list with:
-
optimal_m
: Integer. The optimal number of factors. -
IC_values
: Numeric vector of IC values for each candidatem
.
References
Su, L., & Wang, X. (2017). On time-varying factor models: Estimation and testing. Journal of Econometrics, 198(1), 84–101.
Examples
set.seed(123)
returns <- matrix(rnorm(100 * 30), nrow = 100, ncol = 30)
# Function usage
result <- determine_factors(returns, max_m = 5)
print(result$optimal_m)
print(result$IC_values)
# R6 usage
tv <- TVMVP$new()
tv$set_data(returns)
tv$determine_factors(max_m = 5)
tv$get_optimal_m()
tv$get_IC_values()