Skip to content

Commit

Permalink
Fix chain rule to not imply second differentiation (tensorflow#3650)
Browse files Browse the repository at this point in the history
Fixes part of tensorflow#3629.
  • Loading branch information
girving authored Aug 4, 2016
1 parent c5f94b1 commit 9f47ac6
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions tensorflow/g3doc/how_tos/adding_an_op/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1011,13 +1011,13 @@ function which computes gradients with respect to the ops' inputs given
gradients with respect to the ops' outputs.

Mathematically, if an op computes \\(y = f(x)\\) the registered gradient op
converts gradients \\(\partial / \partial y\\) with respect to \\(y\\) into
gradients \\(\partial / \partial x\\) with respect to \\(x\\) via the chain
rule:
converts gradients \\(\partial L/ \partial y\\) of loss \\(L\\) with respect to
\\(y\\) into gradients \\(\partial L/ \partial x\\) with respect to \\(x\\) via
the chain rule:

$$\frac{\partial}{\partial x}
= \frac{\partial}{\partial y} \frac{\partial y}{\partial x}
= \frac{\partial}{\partial y} \frac{\partial f}{\partial x}.$$
$$\frac{\partial L}{\partial x}
= \frac{\partial L}{\partial y} \frac{\partial y}{\partial x}
= \frac{\partial L}{\partial y} \frac{\partial f}{\partial x}.$$

In the case of `ZeroOut`, only one entry in the input affects the output, so the
gradient with respect to the input is a sparse "one hot" tensor. This is
Expand Down

0 comments on commit 9f47ac6

Please sign in to comment.