|
108 | 108 | # :math:`\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5`.
|
109 | 109 |
|
110 | 110 | ###############################################################
|
111 |
| -# You can do many crazy things with autograd! |
| 111 | +# Mathematically, if you have a vector valued function :math:`\vec{y}=f(\vec{x})`, |
| 112 | +# then the gradient of :math:`\vec{y}` with respect to :math:`\vec{x}` |
| 113 | +# is a Jacobian matrix: |
| 114 | +# |
| 115 | +# .. math:: |
| 116 | +# J=\left(\begin{array}{ccc} |
| 117 | +# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\ |
| 118 | +# \vdots & \ddots & \vdots\\ |
| 119 | +# \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} |
| 120 | +# \end{array}\right) |
| 121 | +# |
| 122 | +# Generally speaking, ``torch.autograd`` is an engine for computing |
| 123 | +# Jacobian-vector product. That is, given any vector |
| 124 | +# :math:`v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}`, |
| 125 | +# compute the product :math:`J\cdot v`. If :math:`v` happens to be |
| 126 | +# the gradient of a scalar function :math:`l=g\left(\vec{y}\right)`, |
| 127 | +# that is, |
| 128 | +# :math:`v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}`, |
| 129 | +# then by the chain rule, the Jacobian-vector product would be the |
| 130 | +# gradient of :math:`l` with respect to :math:`\vec{x}`: |
| 131 | +# |
| 132 | +# .. math:: |
| 133 | +# J\cdot v=\left(\begin{array}{ccc} |
| 134 | +# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\ |
| 135 | +# \vdots & \ddots & \vdots\\ |
| 136 | +# \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} |
| 137 | +# \end{array}\right)\left(\begin{array}{c} |
| 138 | +# \frac{\partial l}{\partial y_{1}}\\ |
| 139 | +# \vdots\\ |
| 140 | +# \frac{\partial l}{\partial y_{m}} |
| 141 | +# \end{array}\right)=\left(\begin{array}{c} |
| 142 | +# \frac{\partial l}{\partial x_{1}}\\ |
| 143 | +# \vdots\\ |
| 144 | +# \frac{\partial l}{\partial x_{n}} |
| 145 | +# \end{array}\right) |
| 146 | +# |
| 147 | +# This characteristic of Jacobian-vector product makes it very |
| 148 | +# convenient to feed external gradients into a model that has |
| 149 | +# non-scalar output. |
112 | 150 |
|
| 151 | +############################################################### |
| 152 | +# Now let's take a look at an example of Jacobian-vector product: |
113 | 153 |
|
114 | 154 | x = torch.randn(3, requires_grad=True)
|
115 | 155 |
|
|
120 | 160 | print(y)
|
121 | 161 |
|
122 | 162 | ###############################################################
|
123 |
| -# |
124 |
| -gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) |
125 |
| -y.backward(gradients) |
| 163 | +# Now in this case ``y`` is no longer a scalar. ``torch.autograd`` |
| 164 | +# could not compute the full Jacobian directly, but if we just |
| 165 | +# want the Jacobian-vector product, simply pass the vector to |
| 166 | +# ``backward`` as argument: |
| 167 | +v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) |
| 168 | +y.backward(v) |
126 | 169 |
|
127 | 170 | print(x.grad)
|
128 | 171 |
|
|
0 commit comments