module Chainer
Function
node of the computational graph. FunctionNode
is a class representing a node in a computational graph. The node corresponds to an application of a differentiable function to input variables. When a differentiable function is applied to `Chainer::Variable` objects, it creates an instance of FunctionNode
implementation and calls its `apply` method. The `apply` method basically does the following three things.
1. Adding an edge from the function node to the variable node corresponding to each input. The node of each input is extracted by `Chainer::`Variable.node`. 2. Computing the output arrays of the function. 3. Creating a :class:`Variable` object for each output array and adding an edge from the node of the variable to the function node.
The output variables are then returned.
Constants
- VERSION
Public Class Methods
# File lib/chainer/gradient_check.rb, line 53 def _as_tuple(x) if x.is_a? Array return x else return [x] end end
# File lib/chainer/gradient_check.rb, line 2 def _copy_arrays(xs) xs.map{|x| Chainer.array?(x) ? x.dup : x} end
Returns true if the argument is either of Numo::NArray
or Cumo::NArray
.
@param [Object] obj @return [Boolean]
# File lib/chainer/backend.rb, line 19 def array?(obj) if CUDA.available? return true if obj.kind_of?(Cumo::NArray) end return true if obj.kind_of?(Numo::NArray) false end
Test backward procedure of a given function.
This function automatically check backward-process of given function. For example, when you have a Chainer::Function
class MyFunc
, that gets two arguments and returns one value, you can make its test like this:
def test_my_func(self): func = MyFunc() x1_data = Numo::NArray[...] x2_data = Numo::NArray[...] gy_data = Numo::NArray[...] check_backward(func, [x1_data, x2_data], gy_data)
This method creates Chainer::Variable
objects with x_data
and calls func
with the Chainer::Variable
s to get its result as Chainer::Variable
. Then, it sets y_grad
array to grad
attribute of the result and calls backward
method to get gradients of the inputs. To check correctness of the gradients, the function calls numerical_grad
to calculate numerically the gradients and compares the types of gradients with Chainer::Testing.assert_allclose
. If input objects (x1_data
or/and x2_data
in this example) represent integer variables, their gradients are ignored.
You can simplify a test when MyFunc
gets only one argument:
check_backward(func, x1_data, gy_data)
If MyFunc
is a loss function which returns a zero-dimensional array, pass nil
to gy_data
. In this case, it sets 1
to grad
attribute of the result:
check_backward(my_loss_func, [x1_data, x2_data], nil)
If MyFunc
returns multiple outputs, pass all gradients for outputs as a Array:
gy1_data = Numo::NArray[...] gy2_data = Numo::NArray[...] check_backward(func, x1_data, [gy1_data, gy2_data])
You can also test a Chainer::Link
. To check gradients of parameters of the link, set a Array of the parameters to params
arguments:
check_backward(my_link, [x1_data, x2_data], gy_data, [my_link.W, my_link.b])
Note that params
are not Numo::NArray
s, but Chainer::Variables
s.
Function
objects are acceptable as func
argument:
check_backward(lambda{|x1, x1| f(x1, x2)}, [x1_data, x2_data], gy_data)
@note
+func+ is called many times to get numerical gradients for all inputs. This function doesn't work correctly when +func+ behaves randomly as it gets different gradients.
@param [Method, Proc] func A function which gets Chainer::Variable
s
and returns +Chainer::Variable+ s. +func+ must returns a Array of +Chainer::Variable+ s or one +Chainer::Variable+. You can use +Chainer::Function+ object, +Chainer::Link+ object or a function satisfying the condition.
@param [Numo::NArray or Array<Numo::NArray>] x_data A set of Numo::NArray
s to be
passed to +func+. If +x_data+ is one +Numo::NArray+ object, it is treated as +(x_data,)+.
@param [Numo::NArray or Array<Numo::NArray> or nil] y_grad A set of Numo::NArray
s representing gradients of return-values of
+func+. If +y_grad+ is one +Numo::NArray+ object, it is treated as +(y_grad,)+. If +func+ is a loss-function, +y_grad+ should be set to +nil+.
@param [Chainer::Variable or Array<Chainder::Variable>] params A set of Chainer::Variable
s whose gradients are checked.
When +func+ is a +Chainer::Link+ object, set its parameters as +params+. If +params+ is one +Chainer::Variable+ object, it is treated as +(params,)+.
@param [Float] eps Epsilon value to be passed to numerical_grad
. @param [Float] atol Absolute tolerance to be passed to Chainer::Testing.assert_allclose
. @param [Float] rtol Relative tolerance to be passed to Chainer::Testing.assert_allclose
. @param [Array<Boolean>] no_grads Flag to skip variable for gradient assertion.
It should be same length as +x_data+.
@param [Numo::NArray.class] dtype x_data
and y_grad
are casted to this
dtype when calculating numerical gradients. Only float types and +nil+ are allowed.
@see
.numerical_grad
# File lib/chainer/gradient_check.rb, line 147 def check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil) x_data = _as_tuple(x_data) xm = Chainer.get_array_module(*x_data) if !y_grad.nil? y_grad = _as_tuple(y_grad) end params = _as_tuple(params) xs = x_data.map{|x| Chainer::Variable.new(x)} y = func.(*xs) y = _as_tuple(y) y = Chainer::Functions::Math::Identity.new.apply(y) y_grad = set_y_grad(y, y_grad) # Clear gradients which may exist if func calls backward inside of itself. clear_grads(xs) clear_grads(params) # We only need to call `backward` for one result `Chainer::Variable`. # `Chainer::Variable.backward` method calls `Chainer::Function.backward` of its creator. y[0].backward() param_data = params.map { |p| p.data } if dtype.nil? casted_xs = x_data.map { |x| Chainer::Variable.new(x) } else raise '`dtype` is allowed only float type' if dtype != xm::DFloat && dtype != xm::SFloat casted_xs = x_data.map { |x| x.is_a?(Numo::NArray) ? Chainer::Variable.new(x.cast_to(dtype)) : x } end if no_grads.nil? no_grads = xs.map { |x| x.dtype != Numo::SFloat && x.dtype != Numo::DFloat } else raise "Length of no_grads param and xs should be same." if no_grads.size != xs.size end casted_data = casted_xs.map { |x| x.data.dup } no_grads.zip(xs).each do |skip, x| if skip raise "x.grad is not nil" if x.grad != nil else raise 'gradients of some arguments are not calculated' if x.grad.nil? end end # Keep the gradient arrays of params which may be overwritten by func params_grad = params.map(&:grad) if dtype.nil? one = Numo::DFloat.new().fill(1.0) else one = dtype.new().fill(1.0) end g = lambda do # This functions is called twice in `numerical_grad`. # `one` is `1 + epsilon` or `1 - epsilon` in these calls. # See the document of `numerical_grad`. no_grads.zip(casted_xs, casted_data).each do |skip, cx, data| next if skip || cx.data.empty? # astype is require to store data with the given type data = (one * data).cast_to(data.class) cx.data = data end params.zip(param_data).each do |param, data| if !dtype.nil? param_dtype = dtype else param_dtype = param.dtype end # The inner astype is required to calculates __mul__ in # `param_type` when data is low accuracy float. # The outer one is require to store data with the given type. param.data = (one * data.cast_to(param_dtype)).cast_to(param_dtype) end # Clear gradients to support func that calls backward inside of itself. clear_grads(casted_xs) clear_grads(params) ys = func.(*casted_xs) ys = _as_tuple(ys) ys_data = ys.map { |y| y.data } no_grads.zip(casted_xs, casted_data).each do |skip, cx, data| next if skip cx.data = data end params.zip(param_data).each do |param, data| param.data = data end ys_data end gx, = numerical_grad(g, [one], y_grad, eps) gx_accum = 0 no_grads.zip(xs, casted_xs).each do |skip, x, cx| next if skip gxi = x.grad.flatten.dup cxi = cx.data.flatten.dup unless dtype.nil? gxi = gxi.cast_to(dtype) cxi = cxi.cast_to(dtype) end gx_accum += gxi.empty? ? 0 : gxi.dot(cxi) end params.zip(params_grad).each do |p, gpi| gpi =gpi.flatten.dup pi = p.data.flatten.dup unless dtype.nil? gpi = gpi.cast_to(dtype) pi = pi.cast_to(dtype) end gx_accum += gpi.dot(pi) end Chainer::Testing.assert_allclose(gx, gx_accum, atol: atol, rtol: rtol) end
# File lib/chainer/gradient_check.rb, line 270 def check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil) x_data = _as_tuple(x_data) params = _as_tuple(params) n_x = x_data.size first_order_grad = -> *inputs do xs = inputs[0...n_x] gys = inputs[n_x..-1] y = _as_tuple(func.(*xs)) # Let all elements of y share the same creator. # See the comment in check_backward. y = Chainer::Functions::Math::Identity.new.apply(y) set_y_grad(y, gys) y[0].backward(enable_double_backprop: true) xs.map(&:grad_var) + params.map(&:grad_var) end inputs = x_data + _as_tuple(y_grad) grad_grad = _as_tuple(x_grad_grad) + _as_tuple(params_grad_grad) check_backward(first_order_grad, inputs, grad_grad, params=params, eps: eps, atol: atol, rtol: rtol, no_grads: no_grads, dtype: dtype) end
# File lib/chainer.rb, line 97 def self.configuration @configuration ||= Configuration.new end
# File lib/chainer.rb, line 93 def self.configure yield(configuration) end
Gets an appropriate one from Numo::NArray
or Cumo::NArray
from given arrays.
@param [Array<Chainer::Variable> or Array<Numo::NArray> or Array<Cumo::NArray>] args Values to determine whether Numo or Cumo
should be used. @return [Class] Cumo::NArray
or Numo::NArray
is returned based on the types of the arguments.
# File lib/chainer/backend.rb, line 6 def get_array_module(*args) arrays = args.map {|v| v.kind_of?(Chainer::Variable) ? v.data : v } if CUDA.available? return Cumo if arrays.any? {|a| a.kind_of?(Cumo::NArray) } end return Numo end
# File lib/chainer/function_node.rb, line 248 def self.grad(outputs, inputs, grad_outputs: nil, grad_inputs: nil, set_grad: false, retain_grad: false, enable_double_backprop: false) # The implementation consists of three steps. if !outputs.is_a?(Array) raise TypeError, "outputs must be Array, not #{outputs.class}" end if !inputs.is_a?(Array) raise TypeError, "inputs must be Array, not #{inputs.class}" end if !grad_outputs.nil? && !grad_outputs.is_a?(Array) raise TypeError, "grad_outputs must be Array, not #{grad_outputs.class}" end if !grad_inputs.nil? && !grad_inputs.is_a?(Array) raise TypeError, "grad_inputs must be Array, not #{grad_inputs.class}" end # 1. Backward enumeration: all the nodes reachable backward from the output # nodes are enumerated. The forward direction links are collected in # this step. Note that the variable nodes whose requires_grad is false # are ignored and their creators are not searched. candidate_funcs = outputs.map(&:creator_node).compact visited_funcs = Set.new forward_graph = {} while func = candidate_funcs.pop next if visited_funcs.include?(func) visited_funcs.add(func) func.inputs.each do |x| next unless x.requires_grad forward_graph[x] = [] if forward_graph[x].nil? forward_graph[x] << func creator = x.creator_node if creator && !visited_funcs.include?(creator) candidate_funcs << creator end end end # 2. Forward enumeration: all the nodes in the subgraph reachable from the # input nodes are enumerated. The extracted (sub-)subgraph is the union # of all paths that backpropagation will visit. candidate_vars = inputs.map(&:node) visited_funcs = Set.new grad_required = Set.new while x = candidate_vars.pop grad_required.add(x) forward_graph[x].each do |func| next if visited_funcs.include?(func) visited_funcs.add(func) func.outputs.each do |y_ref| y = y_ref.__getobj__ if y && forward_graph[y] candidate_vars << y end end end end # 3. Backpropagation: the backpropagation is executed along the # (sub-)subgraph. It uses the topological order of the subgraph which is # induced by the reversed order of function applications ("rank"). grads = {} # mapping from variable nodes to their gradients # Initialize the gradient mapping. grad_outputs = [nil] * outputs.size if grad_outputs.nil? outputs.zip(grad_outputs).each do |y, gy| if gy.nil? gy_data = y.data.new_ones gy = Chainer::Variable.new(gy_data, requires_grad: false) end grads[y.node] = gy end unless grad_inputs.nil? inputs.zip(grad_inputs).each do |x, gx| grads[x.node] = gx unless gx.nil? end end # Backprop implementation. It edits grads which will only contain the # gradients w.r.t. the inputs. old_enable_backprop = Chainer.configuration.enable_backprop Chainer.configuration.enable_backprop = enable_double_backprop backprop(outputs, inputs, grad_required, retain_grad, grads) Chainer.configuration.enable_backprop = old_enable_backprop # Extract the gradients w.r.t. the inputs and return them. ret = inputs.map { |x| grads[x.node] } if set_grad inputs.zip(ret).each do |x, gx| x.grad_var = gx end end ret end
Computes numerical gradient by finite differences.
This function is used to implement gradient check. For usage example, see unit tests of Chainer::Functions
.
@param [function] f Ruby function with no arguments that runs forward
computation and returns the result.
@param [Array<Arrays>] inputs Array of arrays that should be treated as
inputs. Each element of them is slightly modified to realize numerical gradient by finite differences.
@param [Array<Arrays>] grad_outputs Array of arrays that are treated as
output gradients.
@param [Float] eps Epsilon value of finite differences. @return [Array] Numerical gradient arrays corresponding to inputs
.
# File lib/chainer/gradient_check.rb, line 21 def numerical_grad(f, inputs, grad_outputs, eps=1e-3) raise unless eps > 0 inputs = inputs.to_a grad_outputs = grad_outputs.to_a grads = inputs.map{|x| x.new_zeros()} inputs.zip(grads).each do |x, gx| orig_x = x.dup # hold original value x.each_with_index{|_, *i| orig = orig_x[*i] x[*i] = orig + eps ys1 = _copy_arrays(f.()) x[*i] = orig - eps ys2 = _copy_arrays(f.()) x[*i] = orig ys1.zip(ys2, grad_outputs).each do |y1, y2, gy| next if gy.nil? diff = y1 - y2 if Chainer.array?(diff) && diff.empty? dot = 0 else dot = (diff * gy).sum end gx[*i] += dot / (2 * eps) end } end return grads end
Private Class Methods
# File lib/chainer/function_node.rb, line 347 def self.backprop(outputs, inputs, grad_required, retain_grad, grads) candidate_funcs = [] visited_funcs = Set.new push_candidate = -> (func) do return if visited_funcs.include?(func) # Negate since heapq is min-heap # The second element is used to make each item unique visited_funcs.add(func) candidate_funcs.unshift(func) candidate_funcs.sort_by! { |f| f.rank } end pop_candidate = -> () do candidate_funcs.pop end outputs.each do |y| creator = y.creator_node next if creator.nil? push_candidate.(creator) end input_nodes = Set.new(inputs.map(&:node)) while func = pop_candidate.() # Collect the gradients w.r.t. the outputs gys = [] func.outputs.each do |y_ref| y = y_ref.__getobj__ if y.nil? gys << nil next end gys << grads[y] end # Collect the gradients w.r.t. the inputs # # Note (Tokui): when the same variable is passed multiple times as # inputs in the same function (e.g. an expression like f(x, x)), the # current implementation passes None as the current gradient w.r.t. # such an input except for the first one (i.e., it builds gxs like # (gx, None) where gx is the current gradient w.r.t. x). gxs = [] input_indexes = [] selected_inputs = Set.new func.inputs.each_with_index do |x, i| next unless grad_required.include?(x) input_indexes << i if selected_inputs.include?(x) gxs << nil else gxs << grads[x] selected_inputs.add(x) end end next if input_indexes.empty? # Do backward new_gxs = func.backward_accumulate(input_indexes, gys, gxs) # Delete output gradients that are not required to return func.outputs.each do |y_ref| y = y_ref.__getobj__ if y && grads[y] && !input_nodes.include?(y) grads.delete(y) end end # Update grads selected_inputs = Set.new input_indexes.zip(new_gxs).each do |i, g| next if g.nil? node = func.inputs[i] if selected_inputs.include?(node) # Accumulate the duplicated gradients here cur_gx = grads[node] if cur_gx g = g + cur_gx end else selected_inputs.add(node) end grads[node] = g if retain_grad v = node.get_variable if v v.grad_var = g end end creator = node.creator_node if creator push_candidate.(creator) end end end end
# File lib/chainer/gradient_check.rb, line 316 def clear_grads(xs) xs.each do |x| x.grad_var = nil end end
# File lib/chainer/gradient_check.rb, line 294 def set_y_grad(y, y_grad) if y_grad.nil? if y.size != 1 raise TypeError, 'When `y_grad` is `None`, the function must return a zero-dimentional array' end y_grad = [1] else if y.size != y_grad.size raise TypeError, '`y_grad` must have the same length of output values' end y.zip(y_grad).each do |iy, igy| if igy.is_a?(Chainer::Variable) iy.grad_var = igy else iy.grad = igy end end end y_grad end
Private Instance Methods
# File lib/chainer/gradient_check.rb, line 53 def _as_tuple(x) if x.is_a? Array return x else return [x] end end
# File lib/chainer/gradient_check.rb, line 2 def _copy_arrays(xs) xs.map{|x| Chainer.array?(x) ? x.dup : x} end
Returns true if the argument is either of Numo::NArray
or Cumo::NArray
.
@param [Object] obj @return [Boolean]
# File lib/chainer/backend.rb, line 19 def array?(obj) if CUDA.available? return true if obj.kind_of?(Cumo::NArray) end return true if obj.kind_of?(Numo::NArray) false end
Test backward procedure of a given function.
This function automatically check backward-process of given function. For example, when you have a Chainer::Function
class MyFunc
, that gets two arguments and returns one value, you can make its test like this:
def test_my_func(self): func = MyFunc() x1_data = Numo::NArray[...] x2_data = Numo::NArray[...] gy_data = Numo::NArray[...] check_backward(func, [x1_data, x2_data], gy_data)
This method creates Chainer::Variable
objects with x_data
and calls func
with the Chainer::Variable
s to get its result as Chainer::Variable
. Then, it sets y_grad
array to grad
attribute of the result and calls backward
method to get gradients of the inputs. To check correctness of the gradients, the function calls numerical_grad
to calculate numerically the gradients and compares the types of gradients with Chainer::Testing.assert_allclose
. If input objects (x1_data
or/and x2_data
in this example) represent integer variables, their gradients are ignored.
You can simplify a test when MyFunc
gets only one argument:
check_backward(func, x1_data, gy_data)
If MyFunc
is a loss function which returns a zero-dimensional array, pass nil
to gy_data
. In this case, it sets 1
to grad
attribute of the result:
check_backward(my_loss_func, [x1_data, x2_data], nil)
If MyFunc
returns multiple outputs, pass all gradients for outputs as a Array:
gy1_data = Numo::NArray[...] gy2_data = Numo::NArray[...] check_backward(func, x1_data, [gy1_data, gy2_data])
You can also test a Chainer::Link
. To check gradients of parameters of the link, set a Array of the parameters to params
arguments:
check_backward(my_link, [x1_data, x2_data], gy_data, [my_link.W, my_link.b])
Note that params
are not Numo::NArray
s, but Chainer::Variables
s.
Function
objects are acceptable as func
argument:
check_backward(lambda{|x1, x1| f(x1, x2)}, [x1_data, x2_data], gy_data)
@note
+func+ is called many times to get numerical gradients for all inputs. This function doesn't work correctly when +func+ behaves randomly as it gets different gradients.
@param [Method, Proc] func A function which gets Chainer::Variable
s
and returns +Chainer::Variable+ s. +func+ must returns a Array of +Chainer::Variable+ s or one +Chainer::Variable+. You can use +Chainer::Function+ object, +Chainer::Link+ object or a function satisfying the condition.
@param [Numo::NArray or Array<Numo::NArray>] x_data A set of Numo::NArray
s to be
passed to +func+. If +x_data+ is one +Numo::NArray+ object, it is treated as +(x_data,)+.
@param [Numo::NArray or Array<Numo::NArray> or nil] y_grad A set of Numo::NArray
s representing gradients of return-values of
+func+. If +y_grad+ is one +Numo::NArray+ object, it is treated as +(y_grad,)+. If +func+ is a loss-function, +y_grad+ should be set to +nil+.
@param [Chainer::Variable or Array<Chainder::Variable>] params A set of Chainer::Variable
s whose gradients are checked.
When +func+ is a +Chainer::Link+ object, set its parameters as +params+. If +params+ is one +Chainer::Variable+ object, it is treated as +(params,)+.
@param [Float] eps Epsilon value to be passed to numerical_grad
. @param [Float] atol Absolute tolerance to be passed to Chainer::Testing.assert_allclose
. @param [Float] rtol Relative tolerance to be passed to Chainer::Testing.assert_allclose
. @param [Array<Boolean>] no_grads Flag to skip variable for gradient assertion.
It should be same length as +x_data+.
@param [Numo::NArray.class] dtype x_data
and y_grad
are casted to this
dtype when calculating numerical gradients. Only float types and +nil+ are allowed.
@see
.numerical_grad
# File lib/chainer/gradient_check.rb, line 147 def check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil) x_data = _as_tuple(x_data) xm = Chainer.get_array_module(*x_data) if !y_grad.nil? y_grad = _as_tuple(y_grad) end params = _as_tuple(params) xs = x_data.map{|x| Chainer::Variable.new(x)} y = func.(*xs) y = _as_tuple(y) y = Chainer::Functions::Math::Identity.new.apply(y) y_grad = set_y_grad(y, y_grad) # Clear gradients which may exist if func calls backward inside of itself. clear_grads(xs) clear_grads(params) # We only need to call `backward` for one result `Chainer::Variable`. # `Chainer::Variable.backward` method calls `Chainer::Function.backward` of its creator. y[0].backward() param_data = params.map { |p| p.data } if dtype.nil? casted_xs = x_data.map { |x| Chainer::Variable.new(x) } else raise '`dtype` is allowed only float type' if dtype != xm::DFloat && dtype != xm::SFloat casted_xs = x_data.map { |x| x.is_a?(Numo::NArray) ? Chainer::Variable.new(x.cast_to(dtype)) : x } end if no_grads.nil? no_grads = xs.map { |x| x.dtype != Numo::SFloat && x.dtype != Numo::DFloat } else raise "Length of no_grads param and xs should be same." if no_grads.size != xs.size end casted_data = casted_xs.map { |x| x.data.dup } no_grads.zip(xs).each do |skip, x| if skip raise "x.grad is not nil" if x.grad != nil else raise 'gradients of some arguments are not calculated' if x.grad.nil? end end # Keep the gradient arrays of params which may be overwritten by func params_grad = params.map(&:grad) if dtype.nil? one = Numo::DFloat.new().fill(1.0) else one = dtype.new().fill(1.0) end g = lambda do # This functions is called twice in `numerical_grad`. # `one` is `1 + epsilon` or `1 - epsilon` in these calls. # See the document of `numerical_grad`. no_grads.zip(casted_xs, casted_data).each do |skip, cx, data| next if skip || cx.data.empty? # astype is require to store data with the given type data = (one * data).cast_to(data.class) cx.data = data end params.zip(param_data).each do |param, data| if !dtype.nil? param_dtype = dtype else param_dtype = param.dtype end # The inner astype is required to calculates __mul__ in # `param_type` when data is low accuracy float. # The outer one is require to store data with the given type. param.data = (one * data.cast_to(param_dtype)).cast_to(param_dtype) end # Clear gradients to support func that calls backward inside of itself. clear_grads(casted_xs) clear_grads(params) ys = func.(*casted_xs) ys = _as_tuple(ys) ys_data = ys.map { |y| y.data } no_grads.zip(casted_xs, casted_data).each do |skip, cx, data| next if skip cx.data = data end params.zip(param_data).each do |param, data| param.data = data end ys_data end gx, = numerical_grad(g, [one], y_grad, eps) gx_accum = 0 no_grads.zip(xs, casted_xs).each do |skip, x, cx| next if skip gxi = x.grad.flatten.dup cxi = cx.data.flatten.dup unless dtype.nil? gxi = gxi.cast_to(dtype) cxi = cxi.cast_to(dtype) end gx_accum += gxi.empty? ? 0 : gxi.dot(cxi) end params.zip(params_grad).each do |p, gpi| gpi =gpi.flatten.dup pi = p.data.flatten.dup unless dtype.nil? gpi = gpi.cast_to(dtype) pi = pi.cast_to(dtype) end gx_accum += gpi.dot(pi) end Chainer::Testing.assert_allclose(gx, gx_accum, atol: atol, rtol: rtol) end
# File lib/chainer/gradient_check.rb, line 270 def check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil) x_data = _as_tuple(x_data) params = _as_tuple(params) n_x = x_data.size first_order_grad = -> *inputs do xs = inputs[0...n_x] gys = inputs[n_x..-1] y = _as_tuple(func.(*xs)) # Let all elements of y share the same creator. # See the comment in check_backward. y = Chainer::Functions::Math::Identity.new.apply(y) set_y_grad(y, gys) y[0].backward(enable_double_backprop: true) xs.map(&:grad_var) + params.map(&:grad_var) end inputs = x_data + _as_tuple(y_grad) grad_grad = _as_tuple(x_grad_grad) + _as_tuple(params_grad_grad) check_backward(first_order_grad, inputs, grad_grad, params=params, eps: eps, atol: atol, rtol: rtol, no_grads: no_grads, dtype: dtype) end
# File lib/chainer/gradient_check.rb, line 316 def clear_grads(xs) xs.each do |x| x.grad_var = nil end end
Gets an appropriate one from Numo::NArray
or Cumo::NArray
from given arrays.
@param [Array<Chainer::Variable> or Array<Numo::NArray> or Array<Cumo::NArray>] args Values to determine whether Numo or Cumo
should be used. @return [Class] Cumo::NArray
or Numo::NArray
is returned based on the types of the arguments.
# File lib/chainer/backend.rb, line 6 def get_array_module(*args) arrays = args.map {|v| v.kind_of?(Chainer::Variable) ? v.data : v } if CUDA.available? return Cumo if arrays.any? {|a| a.kind_of?(Cumo::NArray) } end return Numo end
Computes numerical gradient by finite differences.
This function is used to implement gradient check. For usage example, see unit tests of Chainer::Functions
.
@param [function] f Ruby function with no arguments that runs forward
computation and returns the result.
@param [Array<Arrays>] inputs Array of arrays that should be treated as
inputs. Each element of them is slightly modified to realize numerical gradient by finite differences.
@param [Array<Arrays>] grad_outputs Array of arrays that are treated as
output gradients.
@param [Float] eps Epsilon value of finite differences. @return [Array] Numerical gradient arrays corresponding to inputs
.
# File lib/chainer/gradient_check.rb, line 21 def numerical_grad(f, inputs, grad_outputs, eps=1e-3) raise unless eps > 0 inputs = inputs.to_a grad_outputs = grad_outputs.to_a grads = inputs.map{|x| x.new_zeros()} inputs.zip(grads).each do |x, gx| orig_x = x.dup # hold original value x.each_with_index{|_, *i| orig = orig_x[*i] x[*i] = orig + eps ys1 = _copy_arrays(f.()) x[*i] = orig - eps ys2 = _copy_arrays(f.()) x[*i] = orig ys1.zip(ys2, grad_outputs).each do |y1, y2, gy| next if gy.nil? diff = y1 - y2 if Chainer.array?(diff) && diff.empty? dot = 0 else dot = (diff * gy).sum end gx[*i] += dot / (2 * eps) end } end return grads end
# File lib/chainer/gradient_check.rb, line 294 def set_y_grad(y, y_grad) if y_grad.nil? if y.size != 1 raise TypeError, 'When `y_grad` is `None`, the function must return a zero-dimentional array' end y_grad = [1] else if y.size != y_grad.size raise TypeError, '`y_grad` must have the same length of output values' end y.zip(y_grad).each do |iy, igy| if igy.is_a?(Chainer::Variable) iy.grad_var = igy else iy.grad = igy end end end y_grad end