dgp {dgpsi} | R Documentation |
Deep Gaussian process emulator construction
Description
This function builds and trains a DGP emulator.
Usage
dgp(
X,
Y,
depth = 2,
node = ncol(X),
name = "sexp",
lengthscale = 1,
bounds = NULL,
prior = "ga",
share = TRUE,
nugget_est = FALSE,
nugget = NULL,
scale_est = TRUE,
scale = 1,
connect = TRUE,
likelihood = NULL,
training = TRUE,
verb = TRUE,
check_rep = TRUE,
vecchia = FALSE,
M = 25,
ord = NULL,
N = ifelse(vecchia, 200, 500),
cores = 1,
blocked_gibbs = TRUE,
ess_burn = 10,
burnin = NULL,
B = 10,
internal_input_idx = NULL,
linked_idx = NULL,
id = NULL
)
Arguments
X |
a matrix where each row is an input training data point and each column represents an input dimension. |
Y |
a matrix containing observed training output data. The matrix has its rows being output data points and columns representing
output dimensions. When |
depth |
number of layers (including the likelihood layer) for a DGP structure. |
node |
number of GP nodes in each layer (except for the final layer or the layer feeding the likelihood node) of the DGP. Defaults to
|
name |
a character or a vector of characters that indicates the kernel functions (either
Defaults to |
lengthscale |
initial lengthscales for GP nodes in the DGP emulator. It can be a single numeric value or a vector:
Defaults to a numeric value of |
bounds |
the lower and upper bounds of lengthscales in GP nodes. It can be a vector or a matrix:
Defaults to |
prior |
prior to be used for MAP estimation of lengthscales and nuggets of all GP nodes in the DGP hierarchy:
Defaults to |
share |
a bool indicating if all input dimensions of a GP node share a common lengthscale. Defaults to |
nugget_est |
a bool or a bool vector that indicates if the nuggets of GP nodes (if any) in the final layer are to be estimated. If a single bool is
provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of
Defaults to |
nugget |
the initial nugget value(s) of GP nodes (if any) in each layer:
Set |
scale_est |
a bool or a bool vector that indicates if the variance of GP nodes (if any) in the final layer are to be estimated. If a single bool is
provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of
Defaults to |
scale |
the initial variance value(s) of GP nodes (if any) in the final layer. If it is a single numeric value, it will be applied to all GP nodes (if any)
in the final layer. If it is a vector (which must have a length of |
connect |
a bool indicating whether to implement global input connection to the DGP structure. Setting it to |
likelihood |
the likelihood type of a DGP emulator:
When |
training |
a bool indicating if the initialized DGP emulator will be trained.
When set to |
verb |
a bool indicating if the trace information on DGP emulator construction and training will be printed during the function execution.
Defaults to |
check_rep |
a bool indicating whether to check for repetitions in the dataset, i.e., if one input
position has multiple outputs. Defaults to |
vecchia |
|
M |
|
ord |
If |
N |
number of iterations for the training. Defaults to |
cores |
the number of processes to be used to optimize GP components (in the same layer) at each M-step of the training. If set to |
blocked_gibbs |
a bool indicating if the latent variables are imputed layer-wise using ESS-within-Blocked-Gibbs. ESS-within-Blocked-Gibbs would be faster and
more efficient than ESS-within-Gibbs that imputes latent variables node-wise because it reduces the number of components to be sampled during Gibbs steps,
especially when there is a large number of GP nodes in layers due to higher input dimensions. Default to |
ess_burn |
number of burnin steps for the ESS-within-Gibbs
at each I-step of the training. Defaults to |
burnin |
the number of training iterations to be discarded for
point estimates of model parameters. Must be smaller than the training iterations |
B |
the number of imputations used to produce predictions. Increase the value to refine the representation of imputation uncertainty.
Defaults to |
internal_input_idx |
Column indices of |
linked_idx |
Either a vector or a list of vectors:
Set |
id |
an ID to be assigned to the DGP emulator. If an ID is not provided (i.e., |
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Value
An S3 class named dgp
that contains five slots:
-
id
: A number or character string assigned through theid
argument. -
data
: a list that contains two elements:X
andY
which are the training input and output data respectively. -
specs
: a list that contains-
L (i.e., the number of layers in the DGP hierarchy) sub-lists named
layer1, layer2,..., layerL
. Each sub-list contains D (i.e., the number of GP/likelihood nodes in the corresponding layer) sub-lists namednode1, node2,..., nodeD
. If a sub-list corresponds to a likelihood node, it contains one element calledtype
that gives the name (Hetero
,Poisson
,NegBin
, orCategorical
) of the likelihood node. If a sub-list corresponds to a GP node, it contains four elements:-
kernel
: the type of the kernel function used for the GP node. -
lengthscales
: a vector of lengthscales in the kernel function. -
scale
: the variance value in the kernel function. -
nugget
: the nugget value in the kernel function.
-
-
internal_dims
: the column indices ofX
that correspond to the linked emulators in the preceding layers of a linked system. The slot will be removed in the next release. -
external_dims
: the column indices ofX
that correspond to global inputs to the linked system of emulators. It is shown asFALSE
ifinternal_input_idx = NULL
. The slot will be removed in the next release. -
linked_idx
: the value passed to argumentlinked_idx
. It is shown asFALSE
if the argumentlinked_idx
isNULL
. The slot will be removed in the next release. -
seed
: the random seed generated to produce imputations. This information is stored for reproducibility when the DGP emulator (that was saved bywrite()
with the light optionlight = TRUE
) is loaded back to R byread()
. -
B
: the number of imputations used to generate the emulator. -
vecchia
: whether the Vecchia approximation is used for the GP emulator training. -
M
: the size of the conditioning set for the Vecchia approximation in the DGP emulator training.M
is generated only whenvecchia = TRUE
.
-
-
constructor_obj
: a 'python' object that stores the information of the constructed DGP emulator. -
container_obj
: a 'python' object that stores the information for the linked emulation. -
emulator_obj
: a 'python' object that stores the information for the predictions from the DGP emulator.
The returned dgp
object can be used by
-
predict()
for DGP predictions. -
continue()
for additional DGP training iterations. -
validate()
for LOO and OOS validations. -
plot()
for validation plots. -
lgp()
for linked (D)GP emulator constructions. -
window()
for model parameter trimming. -
summary()
to summarize the trained DGP emulator. -
write()
to save the DGP emulator to a.pkl
file. -
set_imp()
to change the number of imputations. -
design()
for sequential design. -
update()
to update the DGP emulator with new inputs and outputs.
Note
Any R vector detected in X
and Y
will be treated as a column vector and automatically converted into a single-column
R matrix. Thus, if X
is a single data point with multiple dimensions, it must be given as a matrix.
Examples
## Not run:
# load the package and the Python env
library(dgpsi)
# construct a step function
f <- function(x) {
if (x < 0.5) return(-1)
if (x >= 0.5) return(1)
}
# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)
# set a random seed
set_seed(999)
# training a DGP emulator
m <- dgp(X, Y)
# continue for further training iterations
m <- continue(m)
# summarizing
summary(m)
# trace plot
trace_plot(m)
# trim the traces of model parameters
m <- window(m, 800)
# LOO cross validation
m <- validate(m)
plot(m)
# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)
# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)
# write and read the constructed emulator
write(m, 'step_dgp')
m <- read('step_dgp')
## End(Not run)