tsbalancing {gseries} | R Documentation |
Restore cross-sectional (contemporaneous) linear constraints
Description
(version française: https://StatCan.github.io/gensol-gseries/fr/reference/tsbalancing.html)
Replication of the G-Series 2.0 SAS^\circledR
GSeriesTSBalancing macro. See the
G-Series 2.0 documentation for details (Statistics Canada 2016).
This function balances (reconciles) a system of time series according to a set of linear constraints. The balancing solution is obtained by solving one or several quadratic minimization problems (see section Details) with the OSQP solver (Stellato et al. 2020). Given the feasibility of the balancing problem(s), the resulting time series data respect the specified constraints for every time period. Linear equality and inequality constraints are allowed. Optionally, the preservation of temporal totals may also be specified.
Usage
tsbalancing(
in_ts,
problem_specs_df,
temporal_grp_periodicity = 1,
temporal_grp_start = 1,
osqp_settings_df = default_osqp_sequence,
display_level = 1,
alter_pos = 1,
alter_neg = 1,
alter_mix = 1,
alter_temporal = 0,
lower_bound = -Inf,
upper_bound = Inf,
tolV = 0,
tolV_temporal = 0,
tolP_temporal = NA,
# New in G-Series 3.0
validation_tol = 0.001,
trunc_to_zero_tol = validation_tol,
full_sequence = FALSE,
validation_only = FALSE,
quiet = FALSE
)
Arguments
in_ts |
(mandatory) Time series (object of class "ts" or "mts") that contains the time series data to be reconciled. They are the balancing problems' input data (initial solutions). | ||||||||||||||||||||||||||
problem_specs_df |
(mandatory) Balancing problem specifications data frame (object of class "data.frame"). Using a sparse format inspired from the
SAS/OR The information is provided using four mandatory variables (
Note that empty strings (
Finally, the following table lists valid aliases for the
Reviewing the Examples should help conceptualize the balancing problem specifications data frame. | ||||||||||||||||||||||||||
temporal_grp_periodicity |
(optional) Positive integer defining the number of periods in temporal groups for which the totals should be preserved.
E.g., specify Default value is | ||||||||||||||||||||||||||
temporal_grp_start |
(optional) Integer in the [1 .. Default value is | ||||||||||||||||||||||||||
osqp_settings_df |
(optional) Data frame (object of class "data.frame") containing a sequence of OSQP settings for solving the balancing problems. The package includes two predefined OSQP settings sequence data frames:
See Default value is | ||||||||||||||||||||||||||
display_level |
(optional) Integer in the [0 .. 3] interval specifying the level of information to display in the console (
Default value is | ||||||||||||||||||||||||||
alter_pos |
(optional) Nonnegative real number specifying the default alterability coefficient associated to the values of time series with positive
coefficients in all balancing constraints in which they are involved (e.g., component series in aggregation table raking problems).
Alterability coefficients provided in the problem specification data frame (argument Default value is | ||||||||||||||||||||||||||
alter_neg |
(optional) Nonnegative real number specifying the default alterability coefficient associated to the values of time series with negative
coefficients in all balancing constraints in which they are involved (e.g., marginal totals in aggregation table raking problems).
Alterability coefficients provided in the problem specification data frame (argument Default value is | ||||||||||||||||||||||||||
alter_mix |
(optional) Nonnegative real number specifying the default alterability coefficient associated to the values of time series with a mix of
positive and negative coefficients in the balancing constraints in which they are involved. Alterability coefficients provided
in the problem specification data frame (argument Default value is | ||||||||||||||||||||||||||
alter_temporal |
(optional) Nonnegative real number specifying the default alterability coefficient associated to the time series temporal totals.
Alterability coefficients provided in the problem specification data frame (argument Default value is | ||||||||||||||||||||||||||
lower_bound |
(optional) Real number specifying the default lower bound for the time series values. Lower bounds provided in the problem specification
data frame (argument Default value is | ||||||||||||||||||||||||||
upper_bound |
(optional) Real number specifying the default upper bound for the time series values. Upper bounds provided in the problem specification
data frame (argument Default value is | ||||||||||||||||||||||||||
tolV |
(optional) Nonnegative real number specifying the tolerance, in absolute value, for the balancing constraints right-hand side (RHS) values:
where Default value is | ||||||||||||||||||||||||||
tolV_temporal , tolP_temporal |
(optional) Nonnegative real number, or
or
where Example: to set a tolerance of 10 units, specify
Default values are | ||||||||||||||||||||||||||
validation_tol |
(optional) Nonnegative real number specifying the tolerance for the validation of the balancing results. The function verifies if
the final (reconciled) time series values meet the constraints, allowing for discrepancies up to the value specified with
this argument. A warning is issued as soon as one constraint is not met (discrepancy greater than With constraints defined as Default value is | ||||||||||||||||||||||||||
trunc_to_zero_tol |
(optional) Nonnegative real number specifying the tolerance, in absolute value, for replacing by zero (small) values in the output
(reconciled) time series data (output object Note that the final constraint discrepancies (see argument Default value is | ||||||||||||||||||||||||||
full_sequence |
(optional) Logical argument specifying whether all the steps of the OSQP settings sequence data frame should be performed or not.
See argument Default value is | ||||||||||||||||||||||||||
validation_only |
(optional) Logical argument specifying whether the function should only perform input data validation or not. When
Default value is | ||||||||||||||||||||||||||
quiet |
(optional) Logical argument specifying whether or not to display only essential information such as warnings, errors and the period
(or set of periods) being reconciled. You could further suppress, if desired, the display of the balancing period(s)
information by wrapping your Default value is |
Details
This function solves one balancing problem per processing group (see section Processing groups for details). Each of these balancing problems is a quadratic minimization problem of the following form:
\displaystyle
\begin{aligned}
& \underset{\mathbf{x}}{\text{minimize}}
& & \mathbf{\left( y - x \right)}^{\mathrm{T}} W \mathbf{\left( y - x \right)} \\
& \text{subject to}
& & \mathbf{l} \le A \mathbf{x} \le \mathbf{u}
\end{aligned}
where
-
\mathbf{y}
is the vector of the initial problem values, i.e., the initial time series period values and, when applicable, temporal totals; -
\mathbf{x}
is the final (reconciled) version of vector\mathbf{y}
; matrix
W = \mathrm{diag} \left( \mathbf{w} \right)
with vector\mathbf{w}
elementsw_i = \left\{ \begin{array}{cl} 0 & \text{if } |c_i y_i| = 0 \\ \frac{1}{|c_i y_i|} & \text{otherwise} \end{array} \right.
, wherec_i
is the alterability coefficient of problem valuey_i
and cases corresponding to|c_i y_i| = 0
are fixed problem values (binding period values or temporal totals);matrix
A
and vectors\mathbf{l}
and\mathbf{u}
specify the balancing constraints, the implicit temporal total aggregation constraints (when applicable), the period value (upper and lower) bounds as well asx_i = y_i
constraints for fixedy_i
values\left( \left| c_i y_i \right| = 0 \right)
.
In practice, the objective function of the problem solved by OSQP excludes constant term \mathbf{y}^{\mathrm{T}} W
\mathbf{y}
, therefore corresponding to \mathbf{x}^{\mathrm{T}} W \mathbf{x} - 2 \left( \mathbf{w} \mathbf{y}
\right)^{\mathrm{T}} \mathbf{x}
, and the fixed y_i
values \left( \left| c_i y_i \right| = 0
\right)
are removed from the problem, adjusting the constraints accordingly, i.e.:
rows corresponding to the
x_i = y_i
constraints for fixedy_i
values are removed fromA
,\mathbf{l}
and\mathbf{u}
;columns corresponding to fixed
y_i
values are removed fromA
while appropriately adjusting\mathbf{l}
and\mathbf{u}
.
Alterability Coefficients
Alterability coefficients are nonnegative numbers that change the relative cost of modifying an initial problem value.
By changing the actual objective function to minimize, they allow the generation of a wide range of solutions. Since they
appear in the denominator of the objective function (matrix W
), the larger the alterability coefficient the less costly
it is to modify a problem value (period value or temporal total) and, conversely, the smaller the alterability coefficient
the more costly it becomes. This results in problem values with larger alterability coefficients proportionally changing more
than the ones with smaller alterability coefficients. Alterability coefficients of 0.0
define fixed (binding) problem
values while alterability coefficients greater than 0.0
define free (nonbinding) values. The default alterability
coefficients are 0.0
for temporal totals (argument alter_temporal
) and 1.0
for period values (arguments
alter_pos
, alter_neg
, alter_mix
). In the common case of aggregation table raking problems, the period values of the
marginal totals (time series with a coefficient of -1
in the balancing constraints) are usually binding (specified
with alter_neg = 0
) while the period values of the component series (time series with a coefficient of 1
in the
balancing constraints) are usually nonbinding (specified with alter_pos > 0
, e.g., alter_pos = 1
). Almost binding
problem values (e.g., marginal totals or temporal totals) can be obtained in practice by specifying very small (almost
0.0
) alterability coefficients relative to those of the other (nonbinding) problem values.
Temporal total preservation refers to the fact that temporal totals, when applicable, are usually kept “as close as possible” to their initial value. Pure preservation is achieved by default with binding temporal totals while the change is minimized with nonbinding temporal totals (in accordance with the set of alterability coefficients).
Validation and troubleshooting
Successful balancing problems (problems with a valid solution) have sol_status_val > 0
or, equivalently,
n_unmet_con = 0
or max_discr <= validation_tol
in the output proc_grp_df
data frame. Troubleshooting
unsuccessful balancing problems is not necessarily straightforward. Following are some suggestions:
Investigate the failed constraints (
unmet_flag = TRUE
or, equivalently,discr_out > validation_tol
in the outputprob_con_df
data frame) to make sure that they do not cause an empty solution space (infeasible problem).Change the OSQP solving sequence. E.g., try:
argument
full_sequence = TRUE
argument
osqp_settings_df = alternate_osqp_sequence
arguments
osqp_settings_df = alternate_osqp_sequence
andfull_sequence = TRUE
See
vignette("osqp-settings-sequence-dataframe")
for more details on this topic.Increase (review) the
validation_tol
value. Although this may sound like cheating, the defaultvalidation_tol
value (1 \times 10^{-3}
) may actually be too small for balancing problems that involve very large values (e.g., in billions) or, conversely, too large with very small problem values (e.g,< 1.0
). Multiplying the average scale of the problem data by the machine tolerance (.Machine$double.eps
) gives an approximation of the average size of the discrepancies thattsbalancing()
should be able to handle (distinguish from0
) and should probably constitute an absolute lower bound for argumentvalidation_tol
. In practice, a reasonablevalidation_tol
value would likely be1 \times 10^3
to1 \times 10^6
times larger than this lower bound.Address constraints redundancy. Multi-dimensional aggregation table raking problems are over-specified (involve redundant constraints) when all totals of all dimensions of the data cube are binding (fixed) and a constraint is defined for all of them. Redundancy also occurs for the implicit temporal aggregation constraints in single- or multi-dimensional aggregation table raking problems with binding (fixed) temporal totals. Over-specification is generally not an issue for
tsbalancing()
if the input data are not contradictory with regards to the redundant constraints, i.e., if there are no inconsistencies (discrepancies) associated to the redundant constraints in the input data or if they are negligible (reasonably small relative to the scale of the problem data). Otherwise, this may lead to unsuccessful balancing problems withtsbalancing()
. Possible solutions would then include:Resolve (or reduce) the discrepancies associated to the redundant constraints in the input data.
Select one marginal total in every dimension, but one, of the data cube and remove the corresponding balancing constraints from the problem. This cannot be done for the implicit temporal aggregation constraints.
Select one marginal total in every dimension, but one, of the data cube and make them nonbinding (alterability coefficient of, say,
1.0
).Do the same as (3) for the temporal totals of one of the inner-cube component series (make them nonbinding).
Make all marginal totals of every dimension, but one, of the data cube amlost binding, i.e., specify very small alterability coefficients (say
1 \times 10^{-6}
) compared to those of the inner-cube component series.Do the same as (5) for the temporal totals of all inner-cube component series (very small alterability coefficients, e.g., with argument
alter_temporal
).Use
tsraking()
(if applicable), which handles these inconsistencies by using the Moore-Penrose inverse (uniform distribution among all binding totals).
Solutions (2) to (7) above should only be considered if the discrepancies associated to the redundant constraints in the input data are reasonably small as they would be distributed among the omitted or nonbinding totals with
tsbalancing()
and all binding totals withtsraking()
. Otherwise, one should first investigate solution (1) above.Relax the bounds of the problem constraints, e.g.:
argument
tolV
for the balancing constraints;arguments
tolV_temporal
andtolP_temporal
for the implicit temporal aggregation constraints;arguments
lower_bound
andupper_bound
.
Value
The function returns is a list of seven objects:
-
out_ts
: modified version of the input time series object (class "ts" or "mts"; see argumentin_ts
) with the resulting reconciled time series values (primary function output). It can be explicitly coerced to another type of object with the appropriateas*()
function (e.g.,tsibble::as_tsibble()
would coerce it to a tsibble). -
proc_grp_df
: processing group summary data frame, useful to identify problems that have succeeded or failed. It contains one observation (row) for each balancing problem with the following columns:-
proc_grp
(num): processing group id. -
proc_grp_type
(chr): processing group type. Possible values are:-
"period"
; -
"temporal group"
.
-
-
proc_grp_label
(chr): string describing the processing group in the following format:-
"<year>-<period>"
(single periods) -
"<start year>-<start period> - <end year>-<end period>"
(temporal groups)
-
-
sol_status_val
,sol_status
(num, chr): solution status numerical (integer) value and description string:-
1
:"valid initial solution"
; -
-1
:"invalid initial solution"
; -
2
:"valid polished osqp solution"
; -
-2
:"invalid polished osqp solution"
; -
3
:"valid unpolished osqp solution"
; -
-3
:"invalid unpolished osqp solution"
; -
-4
:"unsolvable fixed problem"
(invalid initial solution).
-
-
n_unmet_con
(num): number of unmet constraints (sum(prob_conf_df$unmet_flag)
). -
max_discr
(num): maximum constraint discrepancy (max(prob_conf_df$discr_out)
). -
validation_tol
(num): specified tolerance for validation purposes (argumentvalidation_tol
). -
sol_type
(chr): returned solution type. Possible values are:-
"initial"
(initial solution, i.e., input data values); -
"osqp"
(OSQP solution).
-
-
osqp_attempts
(num): number of attempts made with OSQP (depth achieved in the solving sequence). -
osqp_seqno
(num): step # of the solving sequence corresponding to the returned solution.NA
whensol_type = "initial"
. -
osqp_status
(chr): OSQP status description string (osqp_sol_info_df$status
).NA
whensol_type = "initial"
. -
osqp_polished
(logi):TRUE
if the returned OSQP solution is polished (osqp_sol_info_df$status_polish = 1
),FALSE
otherwise.NA
whensol_type = "initial"
. -
total_solve_time
(num): total time, in seconds, of the solving sequence.
Column
proc_grp
constitutes a unique key (distinct rows) for the data frame. Successful balancing problems (problems with a valid solution) correspond to rows withsol_status_val > 0
or, equivalently, ton_unmet_con = 0
or tomax_discr <= validation_tol
. The initial solution (sol_type = "initial"
) is returned only if a) there are no initial constraint discrepancies, b) the problem is fixed (all values are binding) or c) it beats the OSQP solution (smaller total constraint discrepancies). The OSQP solving sequence is described invignette("osqp-settings-sequence-dataframe")
. -
-
periods_df
: time periods data frame, useful to match periods to processing groups. It contains one observation (row) for each period of the input time series object (argumentin_ts
) with the following columns:-
proc_grp
(num): processing group id. -
t
(num): time id (1:nrow(in_ts)
). -
time_val
(num): time value (stats::time(in_ts)
). It conceptually corresponds toyear + (period - 1) / frequency
.
Columns
t
andtime_val
both constitute a unique key (distinct rows) for the data frame. -
-
prob_val_df
: problem values data frame, useful to analyze change diagnostics, i.e., initial vs final (reconciled) values. It contains one observation (row) for each value involved in each balancing problem, with the following columns:-
proc_grp
(num): processing group id. -
val_type
(chr): problem value type. Possible values are:-
"period value"
; -
"temporal total"
.
-
-
name
(chr): time series (variable) name. -
t
(num): time id (1:nrow(in_ts)
); id of the first period of the temporal group for a temporal total. -
time_val
(num): time value (stats::time(in_ts)
); value of the first period of the temporal group for a temporal total. It conceptually corresponds toyear + (period - 1) / frequency
. -
lower_bd
,upper_bd
(num): period value bounds; always-Inf
andInf
for a temporal total. -
alter
(num): alterability coefficient. -
value_in
,value_out
(num): initial and final (reconciled) values. -
dif
(num):value_out - value_in
. -
rdif
(num):dif / value_in
;NA
ifvalue_in = 0
.
Columns
val_type + name + t
andval_type + name + time_val
both constitute a unique key (distinct rows) for the data frame. Binding (fixed) problem values correspond to rows withalter = 0
orvalue_in = 0
. Conversely, nonbinding (free) problem values correspond to rows withalter != 0
andvalue_in != 0
. -
-
prob_con_df
: problem constraints data frame, useful for troubleshooting problems that failed (identify unmet constraints). It contains one observation (row) for each constraint involved in each balancing problem, with the following columns:-
proc_grp
(num): processing group id. -
con_type
(chr): problem constraint type. Possible values are:-
"balancing constraint"
; -
"temporal aggregation constraint"
; -
"period value bounds"
.
While balancing constraints are specicied by the user, the other two types of constraints (temporal aggregation constraints and period value bounds) are automatically added to the problem by the function (when applicable).
-
-
name
(chr): constraint label or time series (variable) name. -
t
(num): time id (1:nrow(in_ts)
); id of the first period of the temporal group for a temporal aggregation constraint. -
time_val
(num): time value (stats::time(in_ts)
); value of the first period of the temporal group for a temporal aggregation constraint. It conceptually corresponds toyear + (period - 1) / frequency
. -
l
,u
,Ax_in
,Ax_out
(num): initial and final constraint elements\left( \mathbf{l} \le A \mathbf{x} \le \mathbf{u} \right)
. -
discr_in
,discr_out
(num): initial and final constraint discrepancies\left( \max \left( 0, \mathbf{l} - A \mathbf{x}, A \mathbf{x} - \mathbf{u} \right) \right)
. -
validation_tol
(num): specified tolerance for validation purposes (argumentvalidation_tol
). -
unmet_flag
(logi):TRUE
if the constraint is not met (discr_out > validation_tol
),FALSE
otherwise.
Columns
con_type + name + t
andcon_type + name + time_val
both constitute a unique key (distinct rows) for the data frame. Constraint bounds\mathbf{l = u}
forEQ
constraints,\mathbf{l} = -\infty
forLE
constraints,\mathbf{u} = \infty
forGE
constraints, and include the tolerances, when applicable, specified with argumentstolV
,tolV_temporal
andtolP_temporal
. -
-
osqp_settings_df
: OSQP settings data frame. It contains one observation (row) for each problem (processing group) solved with OSQP (proc_grp_df$sol_type = "osqp"
), with the following columns:-
proc_grp
(num): processing group id. one column corresponding to each element of the list returned by the
osqp::GetParams()
method applied to a OSQP solver object (class "osqp_model" object as returned byosqp::osqp()
), e.g.:Maximum iterations (
max_iter
);Primal and dual infeasibility tolerances (
eps_prim_inf
andeps_dual_inf
);Solution polishing flag (
polish
);Number of scaling iterations (
scaling
);etc.
extra settings specific to
tsbalancing()
:-
prior_scaling
(logi):TRUE
if the problem data were scaled (using the average of the free (nonbinding) problem values as the scaling factor) prior to solving with OSQP,FALSE
otherwise. -
require_polished
(logi):TRUE
if a polished solution from OSQP (osqp_sol_info_df$status_polish = 1
) was required for this step in order to end the solving sequence,FALSE
otherwise. Seevignette("osqp-settings-sequence-dataframe")
for more details on the solving sequence used bytsbalancing()
.
-
Column
proc_grp
constitutes a unique key (distinct rows) for the data frame. Visit https://osqp.org/docs/interfaces/solver_settings.html for all available OSQP settings. Problems (processing groups) for which the initial solution was returned (proc_grp_df$sol_type = "initial"
) are not included in this data frame. -
-
osqp_sol_info_df
: OSQP solution information data frame. It contains one observation (row) for each problem (processing group) solved with OSQP (proc_grp_df$sol_type = "osqp"
), with the following columns:-
proc_grp
(num): processing group id. one column corresponding to each element of the
info
list of a OSQP solver object (class "osqp_model" object as returned byosqp::osqp()
) after having been solved with theosqp::Solve()
method, e.g.:Solution status (
status
andstatus_val
);Polishing status (
status_polish
);Number of iterations (
iter
);Objective function value (
obj_val
);Primal and dual residuals (
pri_res
anddua_res
);Solve time (
solve_time
);etc.
extra information specific to
tsbalancing()
:-
prior_scaling_factor
(num): value of the scaling factor whenosqp_settings_df$prior_scaling = TRUE
(prior_scaling_factor = 1.0
otherwise). -
obj_val_ori_prob
(num): original balancing problem's objective function value, which is the OSQP objective function value (obj_val
) on the original scale (whenosqp_settings_df$prior_scaling = TRUE
) plus the constant term of the original balancing problem's objective function, i.e.,obj_val_ori_prob = obj_val * prior_scaling_factor + <constant term>
, where<constant term>
corresponds to\mathbf{y}^{\mathrm{T}} W \mathbf{y}
. See section Details for the definition of vector\mathbf{y}
, matrixW
and, more generally speaking, the complete expression of the balancing problem's objective function.
-
Column
proc_grp
constitutes a unique key (distinct rows) for the data frame. Visit https://osqp.org for more information on OSQP. Problems (processing groups) for which the initial solution was returned (proc_grp_df$sol_type = "initial"
) are not included in this data frame. -
Note that the "data.frame" objects returned by the function can be explicitly coerced to other types of objects with
the appropriate as*()
function (e.g., tibble::as_tibble()
would coerce any of them to a tibble).
Processing groups
The set of periods of a given reconciliation (raking or balancing) problem is called a processing group and either corresponds to:
a single period with period-by-period processing or, when preserving temporal totals, for the individual periods of an incomplete temporal group (e.g., an incomplete year)
or the set of periods of a complete temporal group (e.g., a complete year) when preserving temporal totals.
The total number of processing groups (total number of reconciliation problems) depends on the set of
periods in the input time series object (argument in_ts
) and on the value of arguments
temporal_grp_periodicity
and temporal_grp_start
.
Common scenarios include temporal_grp_periodicity = 1
(default) for period-by period processing without
temporal total preservation and temporal_grp_periodicity = frequency(in_ts)
for the preservation of annual
totals (calendar years by default). Argument temporal_grp_start
allows the specification of other types of
(non-calendar) years. E.g., fiscal years starting on April correspond to temporal_grp_start = 4
with monthly
data and temporal_grp_start = 2
with quarterly data. Preserving quarterly totals with monthly data would
correspond to temporal_grp_periodicity = 3
.
By default, temporal groups covering more than a year (i.e., corresponding to temporal_grp_periodicity > frequency(in_ts)
start on a
year that is a multiple of
ceiling(temporal_grp_periodicity / frequency(in_ts))
. E.g., biennial groups corresponding to temporal_grp_periodicity = 2 * frequency(in_ts)
start on an even year by default. This behaviour can be changed with argument temporal_grp_start
. E.g., the
preservation of biennial totals starting on an odd year instead of an even year (default) corresponds to
temporal_grp_start = frequency(in_ts) + 1
(along with temporal_grp_periodicity = 2 * frequency(in_ts)
).
See the gs.build_proc_grps()
Examples for common processing group scenarios.
Comparing tsraking()
and tsbalancing()
-
tsraking()
is limited to one- and two-dimensional aggregation table raking problems (with temporal total preservation if required) whiletsbalancing()
handles more general balancing problems (e.g., higher dimensional raking problems, nonnegative solutions, general linear equality and inequality constraints as opposed to aggregation rules only, etc.). -
tsraking()
returns the generalized least squared solution of the Dagum and Cholette regression-based raking model (Dagum and Cholette 2006) whiletsbalancing()
solves the corresponding quadratic minimization problem using a numerical solver. In most cases, convergence to the minimum is achieved and thetsbalancing()
solution matches the (exact)tsraking()
least square solution. It may not be the case, however, if convergence could not be achieved after a reasonable number of iterations. Having said that, only in very rare occasions will thetsbalancing()
solution significantly differ from thetsraking()
solution. -
tsbalancing()
is usually faster thantsraking()
, especially for large raking problems, but is generally more sensitive to the presence of (small) inconsistencies in the input data associated to the redundant constraints of fully specified (over-specified) raking problems.tsraking()
handles these inconsistencies by using the Moore-Penrose inverse (uniform distribution among all binding totals). -
tsbalancing()
accommodates the specification of sparse problems in their reduced form. This is not true in the case oftsraking()
where aggregation rules must always be fully specified since a complete data cube without missing data is expected as input (every single inner-cube component series must contribute to all dimensions of the cube, i.e., to every single outer-cube marginal total series). Both tools handle negative values in the input data differently by default. While the solutions of raking problems obtained from
tsbalancing()
andtsraking()
are identical when all input data points are positive, they will differ if some data points are negative (unless argumentVmat_option = 2
is specified withtsraking()
).While both
tsbalancing()
andtsraking()
allow the preservation of temporal totals, time management is not incorporated intsraking()
. For example, the construction of the processing groups (sets of periods of each raking problem) is left to the user withtsraking()
and separate calls must be submitted for each processing group (each raking problem). That's where helper functiontsraking_driver()
comes in handy withtsraking()
.-
tsbalancing()
returns the same set of series as the input time series object whiletsraking()
returns the set of series involved in the raking problem plus those specified with argumentid
(which could correspond to a subset of the input series).
References
Dagum, E. B. and P. Cholette (2006). Benchmarking, Temporal Distribution and Reconciliation Methods of Time Series. Springer-Verlag, New York, Lecture Notes in Statistics, Vol. 186.
Ferland, M., S. Fortier and J. Bérubé (2016). "A Mathematical Optimization Approach to Balancing Time Series: Statistics Canada’s GSeriesTSBalancing". In JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association. 2292-2306.
Ferland, M. (2018). "Time Series Balancing Quadratic Problem — Hessian matrix and vector of linear objective function coefficients". Internal document. Statistics Canada, Ottawa, Canada.
Quenneville, B. and S. Fortier (2012). "Restoring Accounting Constraints in Time Series – Methods and Software for a Statistical Agency". Economic Time Series: Modeling and Seasonality. Chapman & Hall, New York.
SAS Institute Inc. (2015). "The LP Procedure Sparse Data Input Format". SAS/OR^\circledR
14.1
User's Guide: Mathematical Programming Legacy Procedures.
https://support.sas.com/documentation/cdl/en/ormplpug/68158/HTML/default/viewer.htm#ormplpug_lp_details03.htm
Statistics Canada (2016). "The GSeriesTSBalancing Macro". G-Series 2.0 User Guide. Statistics Canada, Ottawa, Canada.
Statistics Canada (2018). Theory and Application of Reconciliation (Course code 0437). Statistics Canada, Ottawa, Canada.
Stellato, B., G. Banjac, P. Goulart et al. (2020). "OSQP: an operator splitting solver for quadratic programs". Math. Prog. Comp. 12, 637–672 (2020). doi:10.1007/s12532-020-00179-2
See Also
tsraking()
tsraking_driver()
rkMeta_to_blSpecs()
gs.build_proc_grps()
build_balancing_problem()
aliases
Examples
###########
# Example 1: In this first example, the objective is to balance a following simple
# accounting table (`Profits = Revenues – Expenses`) for 5 quarters
# without modifying `Profits` where `Revenues >= 0` and `Expenses >= 0`.
# Problem specifications
my_specs1 <- data.frame(type = c("EQ", rep(NA, 3),
"alter", NA,
"lowerBd", NA, NA),
col = c(NA, "Revenues", "Expenses", "Profits",
NA, "Profits",
NA, "Revenues", "Expenses"),
row = c(rep("Accounting Rule", 4),
rep("Alterability Coefficient", 2),
rep("Lower Bound", 3)),
coef = c(NA, 1, -1, -1,
NA, 0,
NA, 0, 0))
my_specs1
# Problem data
my_series1 <- ts(matrix(c( 15, 10, 10,
4, 8, -1,
250, 250, 5,
8, 12, 0,
0, 45, -55),
ncol = 3,
byrow = TRUE,
dimnames = list(NULL, c("Revenues", "Expenses", "Profits"))),
start = c(2022, 1),
frequency = 4)
# Reconcile the data
out_balanced1 <- tsbalancing(in_ts = my_series1,
problem_specs_df = my_specs1,
display_level = 3)
# Initial data
my_series1
# Reconciled data
out_balanced1$out_ts
# Check for invalid solutions
any(out_balanced1$proc_grp_df$sol_status_val < 0)
# Display the maximum output constraint discrepancies
out_balanced1$proc_grp_df[, c("proc_grp_label", "max_discr")]
# The solution returned by `tsbalancing()` corresponds to equal proportional changes
# (pro-rating) and is related to the default alterability coefficients of 1. Equal
# absolute changes could be obtained instead by specifying alterability coefficients
# equal to the inverse of the initial values.
#
# Let’s do this for the processing group 2022Q2 (`timeVal = 2022.25`), with the default
# displayed level of information (`display_level = 1`).
my_specs1b <- rbind(cbind(my_specs1,
data.frame(timeVal = rep(NA_real_, nrow(my_specs1)))),
data.frame(type = rep(NA, 2),
col = c("Revenues", "Expenses"),
row = rep("Alterability Coefficient", 2),
coef = c(0.25, 0.125),
timeVal = rep(2022.25, 2)))
my_specs1b
out_balanced1b <- tsbalancing(in_ts = my_series1,
problem_specs_df = my_specs1b)
# Display the initial 2022Q2 values and both solutions
cbind(data.frame(Status = c("initial", "pro-rating", "equal change")),
rbind(as.data.frame(my_series1[2, , drop = FALSE]),
as.data.frame(out_balanced1$out_ts[2, , drop = FALSE]),
as.data.frame(out_balanced1b$out_ts[2, , drop = FALSE])),
data.frame(Accounting_discr = c(my_series1[2, 1] - my_series1[2, 2] -
my_series1[2, 3],
out_balanced1$out_ts[2, 1] -
out_balanced1$out_ts[2, 2] -
out_balanced1$out_ts[2, 3],
out_balanced1b$out_ts[2, 1] -
out_balanced1b$out_ts[2, 2] -
out_balanced1b$out_ts[2, 3]),
RelChg_Rev = c(NA,
out_balanced1$out_ts[2, 1] / my_series1[2, 1] - 1,
out_balanced1b$out_ts[2, 1] / my_series1[2, 1] - 1),
RelChg_Exp = c(NA,
out_balanced1$out_ts[2, 2] / my_series1[2, 2] - 1,
out_balanced1b$out_ts[2, 2] / my_series1[2, 2] - 1),
AbsChg_Rev = c(NA,
out_balanced1$out_ts[2, 1] - my_series1[2, 1],
out_balanced1b$out_ts[2, 1] - my_series1[2, 1]),
AbsChg_Exp = c(NA,
out_balanced1$out_ts[2, 2] - my_series1[2, 2],
out_balanced1b$out_ts[2, 2] - my_series1[2, 2])))
###########
# Example 2: In this second example, consider the simulated data on quarterly
# vehicle sales by region (West, Centre and East), along with a national
# total for the three regions, and by type of vehicles (cars, trucks and
# a total that may include other types of vehicles). The input data correspond
# to directly seasonally adjusted data that have been benchmarked to the
# annual totals of the corresponding unadjusted time series data as part
# of the seasonal adjustment process (e.g., with the FORCE spec in the
# X-13ARIMA-SEATS software).
#
# The objective is to reconcile the regional sales to the national sales
# without modifying the latter while ensuring that the sum of the sales of
# cars and trucks do not exceed 95% of the sales for all types of vehicles
# in any quarter. For illustrative purposes, we assume that the sales of
# trucks in the Centre region for the 2nd quarter of 2022 cannot be modified.
# Problem specifications
my_specs2 <- data.frame(
type = c("EQ", rep(NA, 4),
"EQ", rep(NA, 4),
"EQ", rep(NA, 4),
"LE", rep(NA, 3),
"LE", rep(NA, 3),
"LE", rep(NA, 3),
"alter", rep(NA, 4)),
col = c(NA, "West_AllTypes", "Centre_AllTypes", "East_AllTypes", "National_AllTypes",
NA, "West_Cars", "Centre_Cars", "East_Cars", "National_Cars",
NA, "West_Trucks", "Centre_Trucks", "East_Trucks", "National_Trucks",
NA, "West_Cars", "West_Trucks", "West_AllTypes",
NA, "Centre_Cars", "Centre_Trucks", "Centre_AllTypes",
NA, "East_Cars", "East_Trucks", "East_AllTypes",
NA, "National_AllTypes", "National_Cars", "National_Trucks", "Centre_Trucks"),
row = c(rep("National Total - All Types", 5),
rep("National Total - Cars", 5),
rep("National Total - Trucks", 5),
rep("West Region Sum", 4),
rep("Center Region Sum", 4),
rep("East Region Sum", 4),
rep("Alterability Coefficient", 5)),
coef = c(NA, 1, 1, 1, -1,
NA, 1, 1, 1, -1,
NA, 1, 1, 1, -1,
NA, 1, 1, -.95,
NA, 1, 1, -.95,
NA, 1, 1, -.95,
NA, 0, 0, 0, 0),
time_val = c(rep(NA, 31), 2022.25))
# Beginning and end of the specifications data frame
head(my_specs2, n = 10)
tail(my_specs2)
# Problem data
my_series2 <- ts(
matrix(c(43, 49, 47, 136, 20, 18, 12, 53, 20, 22, 26, 61,
40, 45, 42, 114, 16, 16, 19, 44, 21, 26, 21, 59,
35, 47, 40, 133, 14, 15, 16, 50, 19, 25, 19, 71,
44, 44, 45, 138, 19, 20, 14, 52, 21, 18, 27, 74,
46, 48, 55, 135, 16, 15, 19, 51, 27, 25, 28, 54),
ncol = 12,
byrow = TRUE,
dimnames = list(NULL,
c("West_AllTypes", "Centre_AllTypes", "East_AllTypes",
"National_AllTypes", "West_Cars", "Centre_Cars",
"East_Cars", "National_Cars", "West_Trucks",
"Centre_Trucks", "East_Trucks", "National_Trucks"))),
start = c(2022, 1),
frequency = 4)
# Reconcile without displaying the function header and enforce nonnegative data
out_balanced2 <- tsbalancing(
in_ts = my_series2,
problem_specs_df = my_specs2,
temporal_grp_periodicity = frequency(my_series2),
lower_bound = 0,
quiet = TRUE)
# Initial data
my_series2
# Reconciled data
out_balanced2$out_ts
# Check for invalid solutions
any(out_balanced2$proc_grp_df$sol_status_val < 0)
# Display the maximum output constraint discrepancies
out_balanced2$proc_grp_df[, c("proc_grp_label", "max_discr")]
###########
# Example 3: Reproduce the `tsraking_driver()` 2nd example with `tsbalancing()`
# (1-dimensional raking problem with annual total preservation).
# `tsraking()` metadata
my_metadata3 <- data.frame(series = c("cars_alb", "cars_sask", "cars_man"),
total1 = rep("cars_tot", 3))
my_metadata3
# `tsbalancing()` problem specifications
my_specs3 <- rkMeta_to_blSpecs(my_metadata3)
my_specs3
# Problem data
my_series3 <- ts(matrix(c(14, 18, 14, 58,
17, 14, 16, 44,
14, 19, 18, 58,
20, 18, 12, 53,
16, 16, 19, 44,
14, 15, 16, 50,
19, 20, 14, 52,
16, 15, 19, 51),
ncol = 4,
byrow = TRUE,
dimnames = list(NULL, c("cars_alb", "cars_sask",
"cars_man", "cars_tot"))),
start = c(2019, 2),
frequency = 4)
# Reconcile the data with `tsraking()` (through `tsraking_driver()`)
out_raked3 <- tsraking_driver(in_ts = my_series3,
metadata_df = my_metadata3,
temporal_grp_periodicity = frequency(my_series3),
quiet = TRUE)
# Reconcile the data with `tsbalancing()`
out_balanced3 <- tsbalancing(in_ts = my_series3,
problem_specs_df = my_specs3,
temporal_grp_periodicity = frequency(my_series3),
quiet = TRUE)
# Initial data
my_series3
# Both sets of reconciled data
out_raked3
out_balanced3$out_ts
# Check for invalid `tsbalancing()` solutions
any(out_balanced3$proc_grp_df$sol_status_val < 0)
# Display the maximum output constraint discrepancies from the `tsbalancing()` solutions
out_balanced3$proc_grp_df[, c("proc_grp_label", "max_discr")]
# Confirm that both solutions (`tsraking() and `tsbalancing()`) are the same
all.equal(out_raked3, out_balanced3$out_ts)