Grad {pnd} | R Documentation |
Gradient computation with parallel capabilities
Description
Computes numerical derivatives and gradients of scalar-valued functions using finite differences. This function supports both two-sided (central, symmetric) and one-sided (forward or backward) derivatives. It can utilise parallel processing to accelerate computation of gradients for slow functions or to attain higher accuracy faster.
Usage
Grad(
FUN,
x,
elementwise = NA,
vectorised = NA,
multivalued = NA,
deriv.order = 1L,
side = 0,
acc.order = 2,
stencil = NULL,
h = NULL,
zero.tol = sqrt(.Machine$double.eps),
h0 = NULL,
control = list(),
f0 = NULL,
cores = 1,
preschedule = TRUE,
cl = NULL,
func = NULL,
method = NULL,
method.args = list(),
...
)
Arguments
FUN |
A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian. |
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
multivalued |
Logical: if |
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
side |
Integer scalar or vector indicating the type of finite difference:
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
stencil |
Optional custom vector of points for function evaluation.
Must include at least |
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
zero.tol |
Small positive integer: if |
h0 |
Numeric scalar of vector: initial step size for automatic search with
|
control |
A named list of tuning parameters passed to |
f0 |
Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
func |
For compatibility with |
method |
For compatibility with |
method.args |
For compatibility with |
... |
Ignored. |
Details
This function aims to be 100% compatible with the syntax of numDeriv::Grad()
,
but there might be differences in the step size because some choices made in
numDeriv
are not consistent with theory.
There is one feature of the default step size in numDeriv
that deserves
an explanation. In that package (but not in pnd
),
If
method = "simple"
, then, simple forward differences are used with a fixed step sizeeps
, which we denote by\varepsilon
.If
method = "Richardson"
, then, central differences are used with a fixed steph := |d\cdot x| + \varepsilon (|x| < \mathrm{zero.tol})
, whered = 1e-4
is the relative step size andeps
becomes an extra addition to the step size for the argument that are closer to zero thanzero.tol
.
We believe that the latter may lead to mistakes when the user believes that they can set
the step size for near-zero arguments, whereas in reality, a combination of d
and eps
is used.
Here is the synopsis of the old arguments:
- side
numDeriv
usesNA
for handling two-sided differences. Thepnd
equivalent is0
, andNA
is replaced with0
.- eps
If
numDeriv
method = "simple"
, then,eps = 1e-4
is the absolute step size and forward differences are used. Ifmethod = "Richardson"
, then,eps = 1e-4
is the absolute increment of the step size for small arguments below the zero tolerance.- d
If
numDeriv
method = "Richardson"
, then,d*abs(x)
is the step size for arguments above the zero tolerance and the baseline step size for small arguments that gets incremented byeps
.- r
The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.
- v
The reduction factor in Richardson extrapolations.
Here are the differences in the new compatible implementation.
- eps
If
numDeriv
method = "simple"
, then,ifelse(x!=0, abs(x), 1) * sqrt(.Machine$double.eps) * 2
is used because one-sided differences require a smaller step size to reduce the truncation error. Ifmethod = "Richardson"
, then,eps = 1e-5
.- d
If
numDeriv
method = "Richardson"
, then,d*abs(x)
is the step size for arguments above the zero tolerance and the baseline step size for small arguments that gets incremented byeps
.- r
The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.
- v
The reduction factor in Richardson extrapolations.
Grad
does an initial check (if f0 = FUN(x)
is not provided)
and calls GenD()
with a set of appropriate parameters (multivalued = FALSE
if the check succeds). In case of parameter mismatch, throws and error.
Value
Numeric vector of the gradient. If FUN
returns a vector,
a warning is issued suggesting the use of Jacobian()
.
See Also
Examples
f <- function(x) sum(sin(x))
g1 <- Grad(FUN = f, x = 1:4)
g2 <- Grad(FUN = f, x = 1:4, h = 7e-6)
g2 - g1 # Tiny differences due to different step sizes
g.auto <- Grad(FUN = f, x = 1:4, h = "SW")
print(g.auto)
attr(g.auto, "step.search")$exitcode # Success
# Gradients for vectorised functions -- e.g. leaky ReLU
LReLU <- function(x) ifelse(x > 0, x, 0.01*x)
Grad(LReLU, seq(-1, 1, 0.1))