Grad {pnd}R Documentation

Gradient computation with parallel capabilities

Description

Computes numerical derivatives and gradients of scalar-valued functions using finite differences. This function supports both two-sided (central, symmetric) and one-sided (forward or backward) derivatives. It can utilise parallel processing to accelerate computation of gradients for slow functions or to attain higher accuracy faster.

Usage

Grad(
  FUN,
  x,
  elementwise = NA,
  vectorised = NA,
  multivalued = NA,
  deriv.order = 1L,
  side = 0,
  acc.order = 2,
  stencil = NULL,
  h = NULL,
  zero.tol = sqrt(.Machine$double.eps),
  h0 = NULL,
  control = list(),
  f0 = NULL,
  cores = 1,
  preschedule = TRUE,
  cl = NULL,
  func = NULL,
  method = NULL,
  method.args = list(),
  ...
)

Arguments

FUN

A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian.

x

Numeric vector or scalar: the point(s) at which the derivative is estimated. FUN(x) must be finite.

elementwise

Logical: is the domain effectively 1D, i.e. is this a mapping \mathbb{R} \mapsto \mathbb{R} or \mathbb{R}^n \mapsto \mathbb{R}^n. If NA, compares the output length ot the input length.

vectorised

Logical: if TRUE, the function is assumed to be vectorised: it will accept a vector of parameters and return a vector of values of the same length. Use FALSE or "no" for functions that take vector arguments and return outputs of arbitrary length (for \mathbb{R}^n \mapsto \mathbb{R}^k functions). If NA, checks the output length and assumes vectorisation if it matches the input length; this check is necessary and potentially slow.

multivalued

Logical: if TRUE, the function is assumed to return vectors longer than 1. Use FALSE for element-wise functions. If NA, attempts inferring it from the function output.

deriv.order

Integer or vector of integers indicating the desired derivative order, \mathrm{d}^m / \mathrm{d}x^m, for each element of x.

side

Integer scalar or vector indicating the type of finite difference: 0 for central, 1 for forward, and -1 for backward differences. Central differences are recommended unless computational cost is prohibitive.

acc.order

Integer or vector of integers specifying the desired accuracy order for each element of x. The final error will be of the order O(h^{\mathrm{acc.order}}).

stencil

Optional custom vector of points for function evaluation. Must include at least m+1 points for the m-th order derivative.

h

Numeric or character specifying the step size(s) for the numerical difference or a method of automatic step determination ("CR", "CRm", "DV", or "SW" to be used in gradstep()). The default value is described in ?GenD.

zero.tol

Small positive integer: if abs(x) >= zero.tol, then, the automatically guessed step size is relative (x multiplied by the step), unless an auto-selection procedure is requested; otherwise, it is absolute.

h0

Numeric scalar of vector: initial step size for automatic search with gradstep().

control

A named list of tuning parameters passed to gradstep().

f0

Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation.

cores

Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one.

preschedule

Logical: if TRUE, disables pre-scheduling for mclapply() or enables load balancing with parLapplyLB(). Recommended for functions that take less than 0.1 s per evaluation.

cl

An optional user-supplied cluster object (created by makeCluster or similar functions). If not NULL, the code uses parLapply() (if preschedule is TRUE) or parLapplyLB() on that cluster on Windows, and mclapply (fork cluster) on everything else.

func

For compatibility with numDeriv::grad() only. If instead of FUN, func is used, it will be reassigned to FUN with a warning.

method

For compatibility with numDeriv::grad() only. Supported values: "simple" and "Richardson". Non-null values result in a warning.

method.args

For compatibility with numDeriv::grad() only. Check ?numDeriv::grad for a list of values. Non-empty lists result in a warning.

...

Ignored.

Details

This function aims to be 100% compatible with the syntax of numDeriv::Grad(), but there might be differences in the step size because some choices made in numDeriv are not consistent with theory.

There is one feature of the default step size in numDeriv that deserves an explanation. In that package (but not in pnd),

We believe that the latter may lead to mistakes when the user believes that they can set the step size for near-zero arguments, whereas in reality, a combination of d and eps is used.

Here is the synopsis of the old arguments:

side

numDeriv uses NA for handling two-sided differences. The pnd equivalent is 0, and NA is replaced with 0.

eps

If numDeriv method = "simple", then, eps = 1e-4 is the absolute step size and forward differences are used. If method = "Richardson", then, eps = 1e-4 is the absolute increment of the step size for small arguments below the zero tolerance.

d

If numDeriv method = "Richardson", then, d*abs(x) is the step size for arguments above the zero tolerance and the baseline step size for small arguments that gets incremented by eps.

r

The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.

v

The reduction factor in Richardson extrapolations.

Here are the differences in the new compatible implementation.

eps

If numDeriv method = "simple", then, ifelse(x!=0, abs(x), 1) * sqrt(.Machine$double.eps) * 2 is used because one-sided differences require a smaller step size to reduce the truncation error. If method = "Richardson", then, eps = 1e-5.

d

If numDeriv method = "Richardson", then, d*abs(x) is the step size for arguments above the zero tolerance and the baseline step size for small arguments that gets incremented by eps.

r

The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.

v

The reduction factor in Richardson extrapolations.

Grad does an initial check (if f0 = FUN(x) is not provided) and calls GenD() with a set of appropriate parameters (multivalued = FALSE if the check succeds). In case of parameter mismatch, throws and error.

Value

Numeric vector of the gradient. If FUN returns a vector, a warning is issued suggesting the use of Jacobian().

See Also

GenD(), Jacobian()

Examples

f <- function(x) sum(sin(x))
g1 <- Grad(FUN = f, x = 1:4)
g2 <- Grad(FUN = f, x = 1:4, h = 7e-6)
g2 - g1  # Tiny differences due to different step sizes
g.auto <- Grad(FUN = f, x = 1:4, h = "SW")
print(g.auto)
attr(g.auto, "step.search")$exitcode  # Success

# Gradients for vectorised functions -- e.g. leaky ReLU
LReLU <- function(x) ifelse(x > 0, x, 0.01*x)
Grad(LReLU, seq(-1, 1, 0.1))

[Package pnd version 0.1.0 Index]