design {dgpsi} | R Documentation |
Sequential design of a (D)GP emulator or a bundle of (D)GP emulators
Description
This function implements sequential design and active learning for a (D)GP emulator or a bundle of (D)GP emulators, supporting an array of popular methods as well as user-specified approaches. It can also be used as a wrapper for Bayesian optimization methods.
Usage
design(
object,
N,
x_cand,
y_cand,
n_sample,
n_cand,
limits,
f,
reps,
freq,
x_test,
y_test,
reset,
target,
method,
batch_size,
eval,
verb,
autosave,
new_wave,
M_val,
cores,
...
)
## S3 method for class 'gp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_sample = 200,
n_cand = lifecycle::deprecated(),
limits = NULL,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
batch_size = 1,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
...
)
## S3 method for class 'dgp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_sample = 200,
n_cand = lifecycle::deprecated(),
limits = NULL,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
batch_size = 1,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
train_N = NULL,
refit_cores = 1,
pruning = TRUE,
control = list(),
...
)
## S3 method for class 'bundle'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_sample = 200,
n_cand = lifecycle::deprecated(),
limits = NULL,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
batch_size = 1,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
train_N = NULL,
refit_cores = 1,
...
)
Arguments
object |
can be one of the following:
|
N |
the number of iterations for the sequential design. |
x_cand |
a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
from which the next design points are determined. Defaults to |
y_cand |
a matrix (with each row being a simulator evaluation and column being an output dimension) that gives the realizations
from the simulator at input positions in |
n_sample |
Defaults to |
n_cand |
|
limits |
a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one
input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input
dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. Set
|
f |
an R function representing the simulator.
See the Note section below for additional details. This argument is required and must be supplied when |
reps |
an integer that gives the number of repetitions of the located design points to be created and used for evaluations of |
freq |
a vector of two integers with the first element indicating the number of iterations taken between re-estimating
the emulator hyperparameters, and the second element defining the number of iterations to take between re-calculation of evaluating metrics
on the validation set (see |
x_test |
a matrix (with each row being an input testing data point and each column being an input dimension) that gives the testing
input data to evaluate the emulator after each |
y_test |
the testing output data corresponding to
Set to |
reset |
A bool or a vector of bools indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting.
The re-fitting occurs based on the frequency specified by
Defaults to |
target |
a number or vector specifying the target evaluation metric value(s) at which the sequential design should terminate.
Defaults to |
method |
See |
batch_size |
|
eval |
an R function that computes a customized metric for evaluating emulator performance. The function must adhere to the following rules:
If no custom function is provided, a built-in evaluation metric (RMSE or log-loss, in the case of DGP emulators with categorical likelihoods) will be used.
Defaults to |
verb |
a bool indicating if trace information will be printed during the sequential design.
Defaults to |
autosave |
a list that contains configuration settings for the automatic saving of the emulator:
|
new_wave |
a bool indicating whether the current call to |
M_val |
|
cores |
an integer that gives the number of processes to be used for emulator validation. If set to |
... |
Any arguments with names that differ from those used in |
train_N |
the number of training iterations to be used for re-fitting the DGP emulator at each step of the sequential design:
Defaults to |
refit_cores |
the number of processes to be used to re-fit GP components (in the same layer of a DGP emulator)
at each M-step during the re-fitting. If set to |
pruning |
a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
design points exceeds |
control |
a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
The argument is only used when |
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Value
An updated object
is returned with a slot called design
that contains:
-
S slots, named
wave1, wave2,..., waveS
, that contain information of S waves of sequential design that have been applied to the emulator. Each slot contains the following elements:-
N
, an integer that gives the numbers of iterations implemented in the corresponding wave; -
rmse
, a matrix providing the evaluation metric values for emulators constructed during the corresponding wave, wheneval = NULL
. Each row of the matrix represents an iteration.for an
object
of classgp
, the matrix contains a single column of RMSE values.for an
object
of classdgp
without a categorical likelihood, each row contains mean/median squared errors corresponding to different output dimensions.for an
object
of classdgp
with a categorical likelihood, the matrix contains a single column of log-loss values.for an
object
of classbundle
, each row contains either mean/median squared errors or log-loss values for the emulators in the bundle.
-
metric
: a matrix providing the values of custom evaluation metrics, as computed by the user-suppliedeval
function, for emulators constructed during the corresponding wave. -
freq
, an integer that gives the frequency that the emulator validations are implemented during the corresponding wave. -
enrichment
, a vector of sizeN
that gives the number of new design points added after each step of the sequential design (ifobject
is an instance of thegp
ordgp
class), or a matrix that gives the number of new design points added to emulators in a bundle after each step of the sequential design (ifobject
is an instance of thebundle
class).
If
target
is notNULL
, the following additional elements are also included:-
target
: the target evaluating metric computed by theeval
or built-in function to stop the sequential design. -
reached
: indicates whether thetarget
was reached at the end of the sequential design:a bool if
object
is an instance of thegp
ordgp
class.a vector of bools if
object
is an instance of thebundle
class, with its length determined as follows:equal to the number of emulators in the bundle when
eval = NULL
.equal to the length of the output from
eval
when a customeval
function is provided.
-
a slot called
type
that gives the type of validation:either LOO ('loo') or OOS ('oos') if
eval = NULL
. Seevalidate()
for more information about LOO and OOS.'customized' if a customized R function is provided to
eval
.
two slots called
x_test
andy_test
that contain the data points for the OOS validation if thetype
slot is 'oos'.If
y_cand = NULL
andx_cand
is supplied, and there areNA
s returned from the suppliedf
during the sequential design, a slot calledexclusion
is included that records the located design positions that producedNA
s viaf
. The sequential design will use this information to avoid re-visiting the same locations in later runs ofdesign()
.
See Note section below for further information.
Note
Validation of an emulator is forced after the final step of a sequential design even if
N
is not a multiple of the second element infreq
.Any
loo
oroos
slot that already exists inobject
will be cleaned, and a new slot calledloo
oroos
will be created in the returned object depending on whetherx_test
andy_test
are provided. The new slot gives the validation information of the emulator constructed in the final step of the sequential design. Seevalidate()
for more information about the slotsloo
andoos
.If
object
has previously been used bydesign()
for sequential design, the information of the current wave of the sequential design will replace those of old waves and be contained in the returned object, unlessthe validation type (LOO or OOS depending on whether
x_test
andy_test
are supplied or not) of the current wave of the sequential design is the same as the validation types (shown in thetype
of thedesign
slot ofobject
) in previous waves, and if the validation type is OOS,x_test
andy_test
in the current wave must also be identical to those in the previous waves;both the current and previous waves of the sequential design supply customized evaluation functions to
eval
. Users need to ensure the customized evaluation functions are consistent among different waves. Otherwise, the trace plot of RMSEs produced bydraw()
will show values of different evaluation metrics in different waves.
For the above two cases, the information of the current wave of the sequential design will be added to the
design
slot of the returned object under the namewaveS
.If
object
is an instance of thegp
class andeval = NULL
, the matrix in thermse
slot is single-columned. Ifobject
is an instance of thedgp
orbundle
class andeval = NULL
, the matrix in thermse
slot can have multiple columns that correspond to different output dimensions or different emulators in the bundle.If
object
is an instance of thegp
class andeval = NULL
,target
needs to be a single value giving the RMSE threshold. Ifobject
is an instance of thedgp
orbundle
class andeval = NULL
,target
can be a vector of values that gives the thresholds of evaluating metrics for different output dimensions or different emulators. If a single value is provided, it will be used as the threshold for all output dimensions (ifobject
is an instance of thedgp
) or all emulators (ifobject
is an instance of thebundle
). If a customized function is supplied toeval
andtarget
is given as a vector, the user needs to ensure that the length oftarget
is equal to that of the output fromeval
.When defining
f
, it is important to ensure that:the column order of the first argument of
f
is consistent with the training input used for the emulator;the column order of the output matrix of
f
is consistent with the order of emulator output dimensions (ifobject
is an instance of thedgp
class), or the order of emulators placed inobject
(ifobject
is an instance of thebundle
class).
The output matrix produced by
f
may includeNA
s. This is especially beneficial as it allows the sequential design process to continue without interruption, even if errors orNA
outputs are encountered fromf
at certain input locations identified by the sequential design. Users should ensure that any errors withinf
are handled by appropriately returningNA
s.When defining
eval
, the output metric needs to be positive ifdraw()
is used withlog = T
. And one needs to ensure that a lower metric value indicates a better emulation performance iftarget
is set.
Examples
## Not run:
# load packages and the Python env
library(lhs)
library(dgpsi)
# construct a 2D non-stationary function that takes a matrix as the input
f <- function(x) {
sin(1/((0.7*x[,1,drop=F]+0.3)*(0.7*x[,2,drop=F]+0.3)))
}
# generate the initial design
X <- maximinLHS(5,2)
Y <- f(X)
# generate the validation data
validate_x <- maximinLHS(30,2)
validate_y <- f(validate_x)
# training a 2-layered DGP emulator with the initial design
m <- dgp(X, Y)
# specify the ranges of the input dimensions
lim_1 <- c(0, 1)
lim_2 <- c(0, 1)
lim <- rbind(lim_1, lim_2)
# 1st wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 2nd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 3rd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# draw the design created by the sequential design
draw(m,'design')
# inspect the trace of RMSEs during the sequential design
draw(m,'rmse')
# reduce the number of imputations for faster OOS
m_faster <- set_imp(m, 5)
# plot the OOS validation with the faster DGP emulator
plot(m_faster, x_test = validate_x, y_test = validate_y)
## End(Not run)