module Chainer

Function node of the computational graph. FunctionNode is a class representing a node in a computational graph. The node corresponds to an application of a differentiable function to input variables. When a differentiable function is applied to `Chainer::Variable` objects, it creates an instance of FunctionNode implementation and calls its `apply` method. The `apply` method basically does the following three things.

1. Adding an edge from the function node to the variable node corresponding to each input.
   The node of each input is extracted by `Chainer::`Variable.node`.
2. Computing the output arrays of the function.
3. Creating a :class:`Variable` object for each output array and
   adding an edge from the node of the variable to the function node.

The output variables are then returned.

Constants

VERSION

Public Class Methods

_as_tuple(x) click to toggle source
# File lib/chainer/gradient_check.rb, line 53
def _as_tuple(x)
  if x.is_a? Array
    return x
  else
    return [x]
  end
end
_copy_arrays(xs) click to toggle source
# File lib/chainer/gradient_check.rb, line 2
def _copy_arrays(xs)
  xs.map{|x| Chainer.array?(x) ? x.dup : x}
end
array?(obj) click to toggle source

Returns true if the argument is either of Numo::NArray or Cumo::NArray.

@param [Object] obj @return [Boolean]

# File lib/chainer/backend.rb, line 19
def array?(obj)
  if CUDA.available?
    return true if obj.kind_of?(Cumo::NArray)
  end
  return true if obj.kind_of?(Numo::NArray)
  false
end
check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil) click to toggle source

Test backward procedure of a given function.

This function automatically check backward-process of given function. For example, when you have a Chainer::Function class MyFunc, that gets two arguments and returns one value, you can make its test like this:

def test_my_func(self):
  func = MyFunc()
  x1_data = Numo::NArray[...]
  x2_data = Numo::NArray[...]
  gy_data = Numo::NArray[...]
  check_backward(func, [x1_data, x2_data], gy_data)

This method creates Chainer::Variable objects with x_data and calls func with the Chainer::Variable s to get its result as Chainer::Variable. Then, it sets y_grad array to grad attribute of the result and calls backward method to get gradients of the inputs. To check correctness of the gradients, the function calls numerical_grad to calculate numerically the gradients and compares the types of gradients with Chainer::Testing.assert_allclose. If input objects (x1_data or/and x2_data in this example) represent integer variables, their gradients are ignored.

You can simplify a test when MyFunc gets only one argument:

check_backward(func, x1_data, gy_data)

If MyFunc is a loss function which returns a zero-dimensional array, pass nil to gy_data. In this case, it sets 1 to grad attribute of the result:

check_backward(my_loss_func, [x1_data, x2_data], nil)

If MyFunc returns multiple outputs, pass all gradients for outputs as a Array:

gy1_data = Numo::NArray[...]
gy2_data = Numo::NArray[...]
check_backward(func, x1_data, [gy1_data, gy2_data])

You can also test a Chainer::Link. To check gradients of parameters of the link, set a Array of the parameters to params arguments:

check_backward(my_link, [x1_data, x2_data], gy_data, [my_link.W, my_link.b])

Note that params are not Numo::NArray s, but Chainer::Variables s.

Function objects are acceptable as func argument:

check_backward(lambda{|x1, x1| f(x1, x2)}, [x1_data, x2_data], gy_data)

@note

+func+ is called many times to get numerical gradients for all inputs.
This function doesn't work correctly when +func+ behaves randomly as
it gets different gradients.

@param [Method, Proc] func A function which gets Chainer::Variable s

and returns +Chainer::Variable+ s. +func+ must returns
a Array of +Chainer::Variable+ s or one
+Chainer::Variable+. You can use +Chainer::Function+
object, +Chainer::Link+ object or a function satisfying the
condition.

@param [Numo::NArray or Array<Numo::NArray>] x_data A set of Numo::NArray s to be

passed to +func+. If +x_data+ is one +Numo::NArray+ object, it is
treated as +(x_data,)+.

@param [Numo::NArray or Array<Numo::NArray> or nil] y_grad A set of Numo::NArray s representing gradients of return-values of

+func+. If +y_grad+ is one +Numo::NArray+ object, it is
treated as +(y_grad,)+. If +func+ is a loss-function,
+y_grad+ should be set to +nil+.

@param [Chainer::Variable or Array<Chainder::Variable>] params A set of Chainer::Variable s whose gradients are checked.

When +func+ is a +Chainer::Link+ object,
set its parameters as +params+.
If +params+ is one +Chainer::Variable+ object,
it is treated as +(params,)+.

@param [Float] eps Epsilon value to be passed to numerical_grad. @param [Float] atol Absolute tolerance to be passed to Chainer::Testing.assert_allclose. @param [Float] rtol Relative tolerance to be passed to Chainer::Testing.assert_allclose. @param [Array<Boolean>] no_grads Flag to skip variable for gradient assertion.

It should be same length as +x_data+.

@param [Numo::NArray.class] dtype x_data and y_grad are casted to this

dtype when calculating numerical gradients. Only float types and
+nil+ are allowed.

@see

.numerical_grad
# File lib/chainer/gradient_check.rb, line 147
def check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil)
  x_data = _as_tuple(x_data)
  xm = Chainer.get_array_module(*x_data)
  if !y_grad.nil?
    y_grad = _as_tuple(y_grad)
  end

  params = _as_tuple(params)
  xs = x_data.map{|x| Chainer::Variable.new(x)}
  y = func.(*xs)
  y = _as_tuple(y)
  y = Chainer::Functions::Math::Identity.new.apply(y)

  y_grad = set_y_grad(y, y_grad)

  # Clear gradients which may exist if func calls backward inside of itself.
  clear_grads(xs)
  clear_grads(params)

  # We only need to call `backward` for one result `Chainer::Variable`.
  # `Chainer::Variable.backward` method calls `Chainer::Function.backward` of its creator.
  y[0].backward()

  param_data = params.map { |p| p.data }
  if dtype.nil?
    casted_xs = x_data.map { |x| Chainer::Variable.new(x) }
  else
    raise '`dtype` is allowed only float type' if dtype != xm::DFloat && dtype != xm::SFloat
    casted_xs = x_data.map { |x| x.is_a?(Numo::NArray) ? Chainer::Variable.new(x.cast_to(dtype)) : x  }
  end

  if no_grads.nil?
    no_grads = xs.map { |x| x.dtype != Numo::SFloat && x.dtype != Numo::DFloat }
  else
    raise "Length of no_grads param and xs should be same." if no_grads.size != xs.size
  end

  casted_data = casted_xs.map { |x| x.data.dup }

  no_grads.zip(xs).each do |skip, x|
    if skip
      raise "x.grad is not nil" if  x.grad != nil
    else
      raise 'gradients of some arguments are not calculated' if x.grad.nil?
    end
  end

  # Keep the gradient arrays of params which may be overwritten by func
  params_grad = params.map(&:grad)

  if dtype.nil?
    one = Numo::DFloat.new().fill(1.0)
  else
    one = dtype.new().fill(1.0)
  end

  g = lambda do
    # This functions is called twice in `numerical_grad`.
    # `one` is `1 + epsilon` or `1 - epsilon` in these calls.
    # See the document of `numerical_grad`.
    no_grads.zip(casted_xs, casted_data).each do |skip, cx, data|
      next if skip || cx.data.empty?
      # astype is require to store data with the given type
      data = (one * data).cast_to(data.class)
      cx.data = data
    end

    params.zip(param_data).each do |param, data|
      if !dtype.nil?
        param_dtype = dtype
      else
        param_dtype = param.dtype
      end
      # The inner astype is required to calculates __mul__ in
      # `param_type` when data is low accuracy float.
      # The outer one is require to store data with the given type.
      param.data = (one * data.cast_to(param_dtype)).cast_to(param_dtype)
    end

    # Clear gradients to support func that calls backward inside of itself.
    clear_grads(casted_xs)
    clear_grads(params)

    ys = func.(*casted_xs)
    ys = _as_tuple(ys)
    ys_data = ys.map { |y| y.data }
    no_grads.zip(casted_xs, casted_data).each do |skip, cx, data|
      next if skip
      cx.data = data
    end
    params.zip(param_data).each do |param, data|
      param.data = data
    end
    ys_data
  end

  gx, = numerical_grad(g, [one], y_grad, eps)
  gx_accum = 0

  no_grads.zip(xs, casted_xs).each do |skip, x, cx|
    next if skip
    gxi = x.grad.flatten.dup
    cxi = cx.data.flatten.dup
    unless dtype.nil?
      gxi = gxi.cast_to(dtype)
      cxi = cxi.cast_to(dtype)
    end
    gx_accum += gxi.empty? ? 0 : gxi.dot(cxi)
  end

  params.zip(params_grad).each do |p, gpi|
    gpi =gpi.flatten.dup
    pi = p.data.flatten.dup
    unless dtype.nil?
      gpi = gpi.cast_to(dtype)
      pi = pi.cast_to(dtype)
    end
    gx_accum += gpi.dot(pi)
  end

  Chainer::Testing.assert_allclose(gx, gx_accum, atol: atol, rtol: rtol)
end
check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil) click to toggle source
# File lib/chainer/gradient_check.rb, line 270
def check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil)
  x_data = _as_tuple(x_data)
  params = _as_tuple(params)
  n_x = x_data.size

  first_order_grad = -> *inputs do
    xs = inputs[0...n_x]
    gys = inputs[n_x..-1]

    y = _as_tuple(func.(*xs))
    # Let all elements of y share the same creator.
    # See the comment in check_backward.
    y = Chainer::Functions::Math::Identity.new.apply(y)
    set_y_grad(y, gys)
    y[0].backward(enable_double_backprop: true)

    xs.map(&:grad_var) + params.map(&:grad_var)
  end

  inputs = x_data + _as_tuple(y_grad)
  grad_grad = _as_tuple(x_grad_grad) + _as_tuple(params_grad_grad)
  check_backward(first_order_grad, inputs, grad_grad, params=params, eps: eps, atol: atol, rtol: rtol, no_grads: no_grads, dtype: dtype)
end
configuration() click to toggle source
# File lib/chainer.rb, line 97
def self.configuration
  @configuration ||= Configuration.new
end
configure() { |configuration| ... } click to toggle source
# File lib/chainer.rb, line 93
def self.configure
  yield(configuration)
end
get_array_module(*args) click to toggle source

Gets an appropriate one from Numo::NArray or Cumo::NArray from given arrays.

@param [Array<Chainer::Variable> or Array<Numo::NArray> or Array<Cumo::NArray>] args Values to determine whether Numo or Cumo should be used. @return [Class] Cumo::NArray or Numo::NArray is returned based on the types of the arguments.

# File lib/chainer/backend.rb, line 6
def get_array_module(*args)
  arrays = args.map {|v| v.kind_of?(Chainer::Variable) ? v.data : v }
  if CUDA.available?
    return Cumo if arrays.any? {|a| a.kind_of?(Cumo::NArray) }
  end
  return Numo
end
grad(outputs, inputs, grad_outputs: nil, grad_inputs: nil, set_grad: false, retain_grad: false, enable_double_backprop: false) click to toggle source
# File lib/chainer/function_node.rb, line 248
def self.grad(outputs, inputs, grad_outputs: nil, grad_inputs: nil, set_grad: false, retain_grad: false, enable_double_backprop: false)
  # The implementation consists of three steps.

  if !outputs.is_a?(Array)
    raise TypeError, "outputs must be Array, not #{outputs.class}"
  end
  if !inputs.is_a?(Array)
    raise TypeError, "inputs must be Array, not #{inputs.class}"
  end
  if !grad_outputs.nil? && !grad_outputs.is_a?(Array)
    raise TypeError, "grad_outputs must be Array, not #{grad_outputs.class}"
  end
  if !grad_inputs.nil? && !grad_inputs.is_a?(Array)
    raise TypeError, "grad_inputs must be Array, not #{grad_inputs.class}"
  end

  # 1. Backward enumeration: all the nodes reachable backward from the output
  #    nodes are enumerated. The forward direction links are collected in
  #    this step. Note that the variable nodes whose requires_grad is false
  #    are ignored and their creators are not searched.
  candidate_funcs = outputs.map(&:creator_node).compact
  visited_funcs = Set.new
  forward_graph = {}

  while func = candidate_funcs.pop
    next if visited_funcs.include?(func)
    visited_funcs.add(func)

    func.inputs.each do |x|
      next unless x.requires_grad
      forward_graph[x] = [] if forward_graph[x].nil?
      forward_graph[x] << func
      creator = x.creator_node
      if creator && !visited_funcs.include?(creator)
        candidate_funcs << creator
      end
    end
  end

  # 2. Forward enumeration: all the nodes in the subgraph reachable from the
  #    input nodes are enumerated. The extracted (sub-)subgraph is the union
  #    of all paths that backpropagation will visit.
  candidate_vars = inputs.map(&:node)
  visited_funcs = Set.new
  grad_required = Set.new
  while x = candidate_vars.pop
    grad_required.add(x)
    forward_graph[x].each do |func|
      next if visited_funcs.include?(func)
      visited_funcs.add(func)
      func.outputs.each do |y_ref|
        y = y_ref.__getobj__
        if y && forward_graph[y]
          candidate_vars << y
        end
      end
    end
  end

  # 3. Backpropagation: the backpropagation is executed along the
  #    (sub-)subgraph. It uses the topological order of the subgraph which is
  #    induced by the reversed order of function applications ("rank").
  grads = {}  # mapping from variable nodes to their gradients

  # Initialize the gradient mapping.
  grad_outputs = [nil] * outputs.size if grad_outputs.nil?
  outputs.zip(grad_outputs).each do |y, gy|
    if gy.nil?
      gy_data = y.data.new_ones
      gy = Chainer::Variable.new(gy_data, requires_grad: false)
    end

    grads[y.node] = gy
  end

  unless grad_inputs.nil?
    inputs.zip(grad_inputs).each do |x, gx|
      grads[x.node] = gx unless gx.nil?
    end
  end

  # Backprop implementation. It edits grads which will only contain the
  # gradients w.r.t. the inputs.
  old_enable_backprop = Chainer.configuration.enable_backprop
  Chainer.configuration.enable_backprop = enable_double_backprop
  backprop(outputs, inputs, grad_required, retain_grad, grads)
  Chainer.configuration.enable_backprop = old_enable_backprop

  # Extract the gradients w.r.t. the inputs and return them.
  ret = inputs.map { |x| grads[x.node] }
  if set_grad
    inputs.zip(ret).each do |x, gx|
      x.grad_var = gx
    end
  end

  ret
end
numerical_grad(f, inputs, grad_outputs, eps=1e-3) click to toggle source

Computes numerical gradient by finite differences.

This function is used to implement gradient check. For usage example, see unit tests of Chainer::Functions.

@param [function] f Ruby function with no arguments that runs forward

computation and returns the result.

@param [Array<Arrays>] inputs Array of arrays that should be treated as

inputs. Each element of them is slightly modified to realize numerical
gradient by finite differences.

@param [Array<Arrays>] grad_outputs Array of arrays that are treated as

output gradients.

@param [Float] eps Epsilon value of finite differences. @return [Array] Numerical gradient arrays corresponding to inputs.

# File lib/chainer/gradient_check.rb, line 21
def numerical_grad(f, inputs, grad_outputs, eps=1e-3)
  raise unless eps > 0
  inputs = inputs.to_a
  grad_outputs = grad_outputs.to_a
  grads = inputs.map{|x| x.new_zeros()}

  inputs.zip(grads).each do |x, gx|
    orig_x = x.dup # hold original value
    x.each_with_index{|_, *i|
      orig = orig_x[*i]
      x[*i] = orig + eps
      ys1 = _copy_arrays(f.())
      x[*i] = orig - eps
      ys2 = _copy_arrays(f.())
      x[*i] = orig

      ys1.zip(ys2, grad_outputs).each do |y1, y2, gy|
        next if gy.nil?
        diff = y1 - y2
        if Chainer.array?(diff) && diff.empty?
          dot = 0
        else
          dot = (diff * gy).sum
        end
        gx[*i] += dot / (2 * eps)
      end
    }
  end

  return grads
end

Private Class Methods

backprop(outputs, inputs, grad_required, retain_grad, grads) click to toggle source
# File lib/chainer/function_node.rb, line 347
def self.backprop(outputs, inputs, grad_required, retain_grad, grads)
  candidate_funcs = []
  visited_funcs = Set.new

  push_candidate = -> (func) do
    return if visited_funcs.include?(func)

    # Negate since heapq is min-heap
    # The second element is used to make each item unique
    visited_funcs.add(func)
    candidate_funcs.unshift(func)
    candidate_funcs.sort_by! { |f| f.rank }
  end

  pop_candidate = -> () do
    candidate_funcs.pop
  end

  outputs.each do |y|
    creator = y.creator_node
    next if creator.nil?
    push_candidate.(creator)
  end

  input_nodes = Set.new(inputs.map(&:node))

  while func = pop_candidate.()
    # Collect the gradients w.r.t. the outputs
    gys = []

    func.outputs.each do |y_ref|
      y = y_ref.__getobj__
      if y.nil?
        gys << nil
        next
      end
      gys << grads[y]
    end

    # Collect the gradients w.r.t. the inputs
    #
    # Note (Tokui): when the same variable is passed multiple times as
    # inputs in the same function (e.g. an expression like f(x, x)), the
    # current implementation passes None as the current gradient w.r.t.
    # such an input except for the first one (i.e., it builds gxs like
    # (gx, None) where gx is the current gradient w.r.t. x).
    gxs = []
    input_indexes = []
    selected_inputs = Set.new
    func.inputs.each_with_index do |x, i|
      next unless grad_required.include?(x)

      input_indexes << i
      if selected_inputs.include?(x)
        gxs << nil
      else
        gxs << grads[x]
        selected_inputs.add(x)
      end
    end

    next if input_indexes.empty?

    # Do backward
    new_gxs = func.backward_accumulate(input_indexes, gys, gxs)

    # Delete output gradients that are not required to return
    func.outputs.each do |y_ref|
      y = y_ref.__getobj__
      if y && grads[y] && !input_nodes.include?(y)
        grads.delete(y)
      end
    end

    # Update grads
    selected_inputs = Set.new
    input_indexes.zip(new_gxs).each do |i, g|
      next if g.nil?

      node = func.inputs[i]
      if selected_inputs.include?(node)
        # Accumulate the duplicated gradients here
        cur_gx = grads[node]
        if cur_gx
          g = g + cur_gx
        end
      else
        selected_inputs.add(node)
      end

      grads[node] = g

      if retain_grad
        v = node.get_variable
        if v
          v.grad_var = g
        end
      end

      creator = node.creator_node
      if creator
        push_candidate.(creator)
      end
    end
  end
end
clear_grads(xs) click to toggle source
# File lib/chainer/gradient_check.rb, line 316
def clear_grads(xs)
  xs.each do |x|
    x.grad_var = nil
  end
end
set_y_grad(y, y_grad) click to toggle source
# File lib/chainer/gradient_check.rb, line 294
def set_y_grad(y, y_grad)
  if y_grad.nil?
    if y.size != 1
      raise TypeError, 'When `y_grad` is `None`, the function must return a zero-dimentional array'
    end
    y_grad = [1]
  else
    if y.size != y_grad.size
      raise TypeError, '`y_grad` must have the same length of output values'
    end
    y.zip(y_grad).each do |iy, igy|
      if igy.is_a?(Chainer::Variable)
        iy.grad_var = igy
      else
        iy.grad = igy
      end
    end
  end

  y_grad
end

Private Instance Methods

_as_tuple(x) click to toggle source
# File lib/chainer/gradient_check.rb, line 53
def _as_tuple(x)
  if x.is_a? Array
    return x
  else
    return [x]
  end
end
_copy_arrays(xs) click to toggle source
# File lib/chainer/gradient_check.rb, line 2
def _copy_arrays(xs)
  xs.map{|x| Chainer.array?(x) ? x.dup : x}
end
array?(obj) click to toggle source

Returns true if the argument is either of Numo::NArray or Cumo::NArray.

@param [Object] obj @return [Boolean]

# File lib/chainer/backend.rb, line 19
def array?(obj)
  if CUDA.available?
    return true if obj.kind_of?(Cumo::NArray)
  end
  return true if obj.kind_of?(Numo::NArray)
  false
end
check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil) click to toggle source

Test backward procedure of a given function.

This function automatically check backward-process of given function. For example, when you have a Chainer::Function class MyFunc, that gets two arguments and returns one value, you can make its test like this:

def test_my_func(self):
  func = MyFunc()
  x1_data = Numo::NArray[...]
  x2_data = Numo::NArray[...]
  gy_data = Numo::NArray[...]
  check_backward(func, [x1_data, x2_data], gy_data)

This method creates Chainer::Variable objects with x_data and calls func with the Chainer::Variable s to get its result as Chainer::Variable. Then, it sets y_grad array to grad attribute of the result and calls backward method to get gradients of the inputs. To check correctness of the gradients, the function calls numerical_grad to calculate numerically the gradients and compares the types of gradients with Chainer::Testing.assert_allclose. If input objects (x1_data or/and x2_data in this example) represent integer variables, their gradients are ignored.

You can simplify a test when MyFunc gets only one argument:

check_backward(func, x1_data, gy_data)

If MyFunc is a loss function which returns a zero-dimensional array, pass nil to gy_data. In this case, it sets 1 to grad attribute of the result:

check_backward(my_loss_func, [x1_data, x2_data], nil)

If MyFunc returns multiple outputs, pass all gradients for outputs as a Array:

gy1_data = Numo::NArray[...]
gy2_data = Numo::NArray[...]
check_backward(func, x1_data, [gy1_data, gy2_data])

You can also test a Chainer::Link. To check gradients of parameters of the link, set a Array of the parameters to params arguments:

check_backward(my_link, [x1_data, x2_data], gy_data, [my_link.W, my_link.b])

Note that params are not Numo::NArray s, but Chainer::Variables s.

Function objects are acceptable as func argument:

check_backward(lambda{|x1, x1| f(x1, x2)}, [x1_data, x2_data], gy_data)

@note

+func+ is called many times to get numerical gradients for all inputs.
This function doesn't work correctly when +func+ behaves randomly as
it gets different gradients.

@param [Method, Proc] func A function which gets Chainer::Variable s

and returns +Chainer::Variable+ s. +func+ must returns
a Array of +Chainer::Variable+ s or one
+Chainer::Variable+. You can use +Chainer::Function+
object, +Chainer::Link+ object or a function satisfying the
condition.

@param [Numo::NArray or Array<Numo::NArray>] x_data A set of Numo::NArray s to be

passed to +func+. If +x_data+ is one +Numo::NArray+ object, it is
treated as +(x_data,)+.

@param [Numo::NArray or Array<Numo::NArray> or nil] y_grad A set of Numo::NArray s representing gradients of return-values of

+func+. If +y_grad+ is one +Numo::NArray+ object, it is
treated as +(y_grad,)+. If +func+ is a loss-function,
+y_grad+ should be set to +nil+.

@param [Chainer::Variable or Array<Chainder::Variable>] params A set of Chainer::Variable s whose gradients are checked.

When +func+ is a +Chainer::Link+ object,
set its parameters as +params+.
If +params+ is one +Chainer::Variable+ object,
it is treated as +(params,)+.

@param [Float] eps Epsilon value to be passed to numerical_grad. @param [Float] atol Absolute tolerance to be passed to Chainer::Testing.assert_allclose. @param [Float] rtol Relative tolerance to be passed to Chainer::Testing.assert_allclose. @param [Array<Boolean>] no_grads Flag to skip variable for gradient assertion.

It should be same length as +x_data+.

@param [Numo::NArray.class] dtype x_data and y_grad are casted to this

dtype when calculating numerical gradients. Only float types and
+nil+ are allowed.

@see

.numerical_grad
# File lib/chainer/gradient_check.rb, line 147
def check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil)
  x_data = _as_tuple(x_data)
  xm = Chainer.get_array_module(*x_data)
  if !y_grad.nil?
    y_grad = _as_tuple(y_grad)
  end

  params = _as_tuple(params)
  xs = x_data.map{|x| Chainer::Variable.new(x)}
  y = func.(*xs)
  y = _as_tuple(y)
  y = Chainer::Functions::Math::Identity.new.apply(y)

  y_grad = set_y_grad(y, y_grad)

  # Clear gradients which may exist if func calls backward inside of itself.
  clear_grads(xs)
  clear_grads(params)

  # We only need to call `backward` for one result `Chainer::Variable`.
  # `Chainer::Variable.backward` method calls `Chainer::Function.backward` of its creator.
  y[0].backward()

  param_data = params.map { |p| p.data }
  if dtype.nil?
    casted_xs = x_data.map { |x| Chainer::Variable.new(x) }
  else
    raise '`dtype` is allowed only float type' if dtype != xm::DFloat && dtype != xm::SFloat
    casted_xs = x_data.map { |x| x.is_a?(Numo::NArray) ? Chainer::Variable.new(x.cast_to(dtype)) : x  }
  end

  if no_grads.nil?
    no_grads = xs.map { |x| x.dtype != Numo::SFloat && x.dtype != Numo::DFloat }
  else
    raise "Length of no_grads param and xs should be same." if no_grads.size != xs.size
  end

  casted_data = casted_xs.map { |x| x.data.dup }

  no_grads.zip(xs).each do |skip, x|
    if skip
      raise "x.grad is not nil" if  x.grad != nil
    else
      raise 'gradients of some arguments are not calculated' if x.grad.nil?
    end
  end

  # Keep the gradient arrays of params which may be overwritten by func
  params_grad = params.map(&:grad)

  if dtype.nil?
    one = Numo::DFloat.new().fill(1.0)
  else
    one = dtype.new().fill(1.0)
  end

  g = lambda do
    # This functions is called twice in `numerical_grad`.
    # `one` is `1 + epsilon` or `1 - epsilon` in these calls.
    # See the document of `numerical_grad`.
    no_grads.zip(casted_xs, casted_data).each do |skip, cx, data|
      next if skip || cx.data.empty?
      # astype is require to store data with the given type
      data = (one * data).cast_to(data.class)
      cx.data = data
    end

    params.zip(param_data).each do |param, data|
      if !dtype.nil?
        param_dtype = dtype
      else
        param_dtype = param.dtype
      end
      # The inner astype is required to calculates __mul__ in
      # `param_type` when data is low accuracy float.
      # The outer one is require to store data with the given type.
      param.data = (one * data.cast_to(param_dtype)).cast_to(param_dtype)
    end

    # Clear gradients to support func that calls backward inside of itself.
    clear_grads(casted_xs)
    clear_grads(params)

    ys = func.(*casted_xs)
    ys = _as_tuple(ys)
    ys_data = ys.map { |y| y.data }
    no_grads.zip(casted_xs, casted_data).each do |skip, cx, data|
      next if skip
      cx.data = data
    end
    params.zip(param_data).each do |param, data|
      param.data = data
    end
    ys_data
  end

  gx, = numerical_grad(g, [one], y_grad, eps)
  gx_accum = 0

  no_grads.zip(xs, casted_xs).each do |skip, x, cx|
    next if skip
    gxi = x.grad.flatten.dup
    cxi = cx.data.flatten.dup
    unless dtype.nil?
      gxi = gxi.cast_to(dtype)
      cxi = cxi.cast_to(dtype)
    end
    gx_accum += gxi.empty? ? 0 : gxi.dot(cxi)
  end

  params.zip(params_grad).each do |p, gpi|
    gpi =gpi.flatten.dup
    pi = p.data.flatten.dup
    unless dtype.nil?
      gpi = gpi.cast_to(dtype)
      pi = pi.cast_to(dtype)
    end
    gx_accum += gpi.dot(pi)
  end

  Chainer::Testing.assert_allclose(gx, gx_accum, atol: atol, rtol: rtol)
end
check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil) click to toggle source
# File lib/chainer/gradient_check.rb, line 270
def check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil)
  x_data = _as_tuple(x_data)
  params = _as_tuple(params)
  n_x = x_data.size

  first_order_grad = -> *inputs do
    xs = inputs[0...n_x]
    gys = inputs[n_x..-1]

    y = _as_tuple(func.(*xs))
    # Let all elements of y share the same creator.
    # See the comment in check_backward.
    y = Chainer::Functions::Math::Identity.new.apply(y)
    set_y_grad(y, gys)
    y[0].backward(enable_double_backprop: true)

    xs.map(&:grad_var) + params.map(&:grad_var)
  end

  inputs = x_data + _as_tuple(y_grad)
  grad_grad = _as_tuple(x_grad_grad) + _as_tuple(params_grad_grad)
  check_backward(first_order_grad, inputs, grad_grad, params=params, eps: eps, atol: atol, rtol: rtol, no_grads: no_grads, dtype: dtype)
end
clear_grads(xs) click to toggle source
# File lib/chainer/gradient_check.rb, line 316
def clear_grads(xs)
  xs.each do |x|
    x.grad_var = nil
  end
end
get_array_module(*args) click to toggle source

Gets an appropriate one from Numo::NArray or Cumo::NArray from given arrays.

@param [Array<Chainer::Variable> or Array<Numo::NArray> or Array<Cumo::NArray>] args Values to determine whether Numo or Cumo should be used. @return [Class] Cumo::NArray or Numo::NArray is returned based on the types of the arguments.

# File lib/chainer/backend.rb, line 6
def get_array_module(*args)
  arrays = args.map {|v| v.kind_of?(Chainer::Variable) ? v.data : v }
  if CUDA.available?
    return Cumo if arrays.any? {|a| a.kind_of?(Cumo::NArray) }
  end
  return Numo
end
numerical_grad(f, inputs, grad_outputs, eps=1e-3) click to toggle source

Computes numerical gradient by finite differences.

This function is used to implement gradient check. For usage example, see unit tests of Chainer::Functions.

@param [function] f Ruby function with no arguments that runs forward

computation and returns the result.

@param [Array<Arrays>] inputs Array of arrays that should be treated as

inputs. Each element of them is slightly modified to realize numerical
gradient by finite differences.

@param [Array<Arrays>] grad_outputs Array of arrays that are treated as

output gradients.

@param [Float] eps Epsilon value of finite differences. @return [Array] Numerical gradient arrays corresponding to inputs.

# File lib/chainer/gradient_check.rb, line 21
def numerical_grad(f, inputs, grad_outputs, eps=1e-3)
  raise unless eps > 0
  inputs = inputs.to_a
  grad_outputs = grad_outputs.to_a
  grads = inputs.map{|x| x.new_zeros()}

  inputs.zip(grads).each do |x, gx|
    orig_x = x.dup # hold original value
    x.each_with_index{|_, *i|
      orig = orig_x[*i]
      x[*i] = orig + eps
      ys1 = _copy_arrays(f.())
      x[*i] = orig - eps
      ys2 = _copy_arrays(f.())
      x[*i] = orig

      ys1.zip(ys2, grad_outputs).each do |y1, y2, gy|
        next if gy.nil?
        diff = y1 - y2
        if Chainer.array?(diff) && diff.empty?
          dot = 0
        else
          dot = (diff * gy).sum
        end
        gx[*i] += dot / (2 * eps)
      end
    }
  end

  return grads
end
set_y_grad(y, y_grad) click to toggle source
# File lib/chainer/gradient_check.rb, line 294
def set_y_grad(y, y_grad)
  if y_grad.nil?
    if y.size != 1
      raise TypeError, 'When `y_grad` is `None`, the function must return a zero-dimentional array'
    end
    y_grad = [1]
  else
    if y.size != y_grad.size
      raise TypeError, '`y_grad` must have the same length of output values'
    end
    y.zip(y_grad).each do |iy, igy|
      if igy.is_a?(Chainer::Variable)
        iy.grad_var = igy
      else
        iy.grad = igy
      end
    end
  end

  y_grad
end