Previous topic

downsample – Down-Sampling

Next topic

config – Theano Configuration

This Page

gradient – Symbolic Differentiation

Platforms: Unix, Windows

Symbolic gradient is usually computed from tensor.grad(), which offers a more convenient syntax for the common case of wanting the gradient in some expressions with respect to a scalar cost. The grad_sources_inputs() function does the underlying work, and is more flexible, but is also more awkward to use when tensor.grad() can do the job.

Driver for gradient calculations.

exception theano.gradient.GradientError(arg, err_pos, abs_err, rel_err, abs_tol, rel_tol)

This error is raised when a gradient is calculated, but incorrect.

theano.gradient.Lop(f, wrt, eval_points, consider_constant=None, warn_type=False, disconnected_inputs='raise')

Computes the L operation on f wrt to wrt evaluated at points given in eval_points. Mathematically this stands for the jacobian of f wrt to wrt left muliplied by the eval points.

Return type:Variable or list/tuple of Variables depending on type of f
Returns:symbolic expression such that L_op[i] = sum_i ( d f[i] / d wrt[j]) eval_point[i] where the indices in that expression are magic multidimensional indices that specify both the position within a list and all coordinates of the tensor element in the last If f is a list/tuple, then return a list/tuple with the results.
theano.gradient.Rop(f, wrt, eval_points)

Computes the R operation on f wrt to wrt evaluated at points given in eval_points. Mathematically this stands for the jacobian of f wrt to wrt right muliplied by the eval points.

Return type:Variable or list/tuple of Variables depending on type of f
Returns:symbolic expression such that R_op[i] = sum_j ( d f[i] / d wrt[j]) eval_point[j] where the indices in that expression are magic multidimensional indices that specify both the position within a list and all coordinates of the tensor element in the last. If wrt is a list/tuple, then return a list/tuple with the results.
theano.gradient.format_as(use_list, use_tuple, outputs)

Formats the outputs according to the flags use_list and use_tuple. If use_list is True, outputs is returned as a list (if outputs is not a list or a tuple then it is converted in a one element list). If use_tuple is True, outputs is returned as a tuple (if outputs is not a list or a tuple then it is converted into a one element tuple). Otherwise (if both flags are false), outputs is returned.

theano.gradient.grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False, disconnected_inputs='raise')
Parameters:
  • g_cost (Scalar Variable, or None.) – an expression for the gradient through cost. The default is ones_like(cost).
  • consider_constant – a list of expressions not to backpropagate through
  • warn_type – a value of True will cause warnings to be logged for any Op that emits a gradient that does not match its input type.
  • disconnected_inputs (string) – Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are: - ‘ignore’: considers that the gradient on these parameters is zero. - ‘warn’: consider the gradient zero, and print a warning. - ‘raise’: raise an exception.
Return type:

Variable or list/tuple of Variables (depending upon wrt)

Returns:

symbolic expression of gradient of cost with respect to wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. It returns an object of same type as wrt: a list/tuple or Variable in all cases.

This function is a wrapper around the more general function theano.gradient.grad_sources_inputs`.

theano.gradient.grad_sources_inputs(sources, graph_inputs, warn_type=True)

A gradient source is a pair (v, g_v), in which v is a Variable, and g_v is a Variable that is a gradient wrt v. More specifically, g_v is the gradient of an external scalar cost, cost (that is not explicitly used), wrt v.

This function traverses the graph backward from the r sources, calling op.grad(...) for all ops with some non-None gradient on an output, to compute gradients of cost wrt intermediate variables and graph_inputs.

The op.grad(...) functions are called like this:

op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])

This call to op.grad should return a list or tuple: one symbolic gradient per input. These gradients represent the gradients of the same implicit cost mentionned above, wrt op.inputs. Note that this is not the same as the gradient of op.outputs wrt op.inputs.

If op has a single input, then op.grad should return a list or tuple of length 1. For each input wrt to which op is not differentiable, it should return None instead of a Variable instance.

If a source r receives a gradient from another source r2, then the effective gradient on r is the sum of both gradients.

Parameters:
  • sources (list of pairs of Variable: (v, gradient-on-v) to initialize the total_gradient dictionary) – gradients to back-propagate using chain rule
  • graph_inputs (list of Variable) – variables considered to be constant (do not backpropagate through them)
  • warn_type (bool) – True will trigger warnings via the logging module when the gradient on an expression has a different type than the original expression
Return type:

dictionary whose keys and values are of type Variable

Returns:

mapping from each Variable encountered in the backward traversal to the gradient with respect to that Variable.

It is assumed that there is some objective J shared between all members of sources, so that for each v, gradient-on-v is the gradient of J with respect to v

theano.gradient.hessian(cost, wrt, consider_constant=None, warn_type=False, disconnected_inputs='raise')
Parameters:
  • consider_constant – a list of expressions not to backpropagate through
  • warn_type – a value of True will cause warnings to be logged for any Op that emits a gradient that does not match its input type.
  • disconnected_inputs (string) – Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are: - ‘ignore’: considers that the gradient on these parameters is zero. - ‘warn’: consider the gradient zero, and print a warning. - ‘raise’: raise an exception.
Returns:

either a instance of Variable or list/tuple of Variables (depending upon wrt) repressenting the Hessian of the cost with respect to (elements of) wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type as wrt: a list/tuple or TensorVariable in all cases.

theano.gradient.jacobian(expression, wrt, consider_constant=None, warn_type=False, disconnected_inputs='raise')
Parameters:
  • consider_constant – a list of expressions not to backpropagate through
  • warn_type – a value of True will cause warnings to be logged for any Op that emits a gradient that does not match its input type.
  • disconnected_inputs (string) – Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are: - ‘ignore’: considers that the gradient on these parameters is zero. - ‘warn’: consider the gradient zero, and print a warning. - ‘raise’: raise an exception.
Returns:

either a instance of Variable or list/tuple of Variables (depending upon wrt) repesenting the jacobian of expression with respect to (elements of) wrt. If an element of wrt is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type as wrt: a list/tuple or TensorVariable in all cases.

class theano.gradient.numeric_grad(f, pt, eps=None)

Compute the numeric derivative of a scalar-valued function at a particular point.

static abs_rel_err(a, b)

Return absolute and relative error between a and b.

The relative error is a small number when a and b are close, relative to how big they are.

Formulas used:
abs_err = abs(a - b) rel_err = abs_err / max(abs(a) + abs(b), 1e-8)

The denominator is clipped at 1e-8 to avoid dividing by 0 when a and b are both close to 0.

The tuple (abs_err, rel_err) is returned

abs_rel_errors(g_pt)

Return the abs and rel error of gradient estimate g_pt

g_pt must be a list of ndarrays of the same length as self.gf, otherwise a ValueError is raised.

Corresponding ndarrays in g_pt and self.gf must have the same shape or ValueError is raised.

max_err(g_pt, abs_tol, rel_tol)

Find the biggest error between g_pt and self.gf.

What is measured is the violation of relative and absolute errors, wrt the provided tolerances (abs_tol, rel_tol). A value > 1 means both tolerances are exceeded.

Return the argmax of min(abs_err / abs_tol, rel_err / rel_tol) over g_pt, as well as abs_err and rel_err at this point.

theano.gradient.unimplemented_grad(op, x_pos, x)

DO NOT USE. Remove this function after all usage of it has been removed from theano.

Return an un-computable symbolic variable of type x.type.

If any function tries to compute this un-computable variable, an exception (NotImplementedError) will be raised indicating that the gradient on the x_pos‘th input of op has not been implemented.

theano.gradient.verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=None, mode=None, cast_to_output_type=False)

Test a gradient by Finite Difference Method. Raise error on failure.

Example:
>>> verify_grad(theano.tensor.tanh,
                (numpy.asarray([[2,3,4], [-1, 3.3, 9.9]]),),
                rng=numpy.random)

Raises an Exception if the difference between the analytic gradient and numerical gradient (computed through the Finite Difference Method) of a random projection of the fun’s output to a scalar exceeds the given tolerance.

Parameters:
  • fun – a Python function that takes Theano variables as inputs, and returns a Theano variable. For instance, an Op instance with a single output.
  • pt – the list of numpy.ndarrays to use as input values. These arrays must be either float32 or float64 arrays.
  • n_tests – number of times to run the test
  • rng – random number generator used to sample u, we test gradient of sum(u * fun) at pt
  • eps – stepsize used in the Finite Difference Method (Default None is type-dependent)
  • abs_tol – absolute tolerance used as threshold for gradient comparison
  • rel_tol – relative tolerance used as threshold for gradient comparison
Note :

WARNING to unit-test writers: if op is a function that builds a graph, try to make it a SMALL graph. Often verify grad is run in debug mode, which can be very slow if it has to verify a lot of intermediate computations.

Note :

This op does not support multiple outputs. In tests/test_scan.py there is an experimental verify_grad that covers that case as well by using random projections.