forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path__init__.py
137 lines (101 loc) · 3.76 KB
/
__init__.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
"""
:mod:`torch.optim` is a package for optimizing neural networks.
It provides a wide variety of optimization methods such as SGD, Adam etc.
Currently, the following optimization methods are supported, typically with
options such as weight decay and other bells and whistles.
- SGD
- AdaDelta
- Adagrad
- Adam
- AdaMax
- Averaged SGD
- RProp
- RMSProp
The usage of the Optim package itself is as follows.
1. Construct an optimizer
2. Use ``optimizer.step(...)`` to optimize.
- Call ``optimizer.zero_grad()`` to zero out the gradient buffers when appropriate
Constructing the optimizer
--------------------------
One first constructs an ``Optimizer`` object by giving it a list of parameters
to optimize, as well as the optimizer options,such as learning rate, weight decay, etc.
Examples::
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr = 0.0001)
Per-parameter options
---------------------
In a more advanced usage, one can specify per-layer options by passing each parameter group along with it's custom options.
**Any parameter group that does not have an attribute defined will use the default attributes.**
This is very useful when one wants to specify per-layer learning rates for example.
For example such invocation::
optim.SGD([
{'params': model1.parameters()},
{'params': model2.parameters(), 'lr': 1e-3}],
lr=1e-2, momentum=0.9)
means that
* ``model1``'s parameters will use the default learning rate of ``1e-2`` and momentum of ``0.9``
* ``model2``'s parameters will use a learning rate of ``1e-3``, and the default momentum of ``0.9``
Then, you can use the optimizer by calling `optimizer.zero_grad()` and `optimizer.step(...)`. Read the next sections.
Taking an optimization step using ``step``
-------------------------------------------------------
``optimizer.step()``
^^^^^^^^^^^^^^^^^^^^
This is a simplified version supported by most optimizers.
The function can be called after computing the gradients with ``backward()``.
Example 2 - training a neural network::
net = MNISTNet()
criterion = ClassNLLLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)
for data in data_batches:
input, target = data
optimizer.zero_grad()
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()
The step function can be used in two ways.
``optimizer.step(closure)``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``step`` function takes a user-defined closure that computes f(x) and returns the loss.
The closure should look somewhat like this::
def f_closure(x):
optimizer.zero_grad()
loss = f(x)
loss.backward()
return loss
Example 1 - training a neural network::
net = MNISTNet()
criterion = ClassNLLLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)
for data in data_batches:
input, target = data
def closure():
optimizer.zero_grad()
output = net(input)
loss = criterion(output, target)
loss.backward()
return loss
optimizer.step(closure)
Note:
**Why is this supported?**
Some optimization algorithms such as Conjugate Gradient and LBFGS need to evaluate their function
multiple times. For such optimization methods, the function (i.e. the closure) has to be defined.
"""
from .adadelta import Adadelta
from .adagrad import Adagrad
from .adam import Adam
from .adamax import Adamax
from .asgd import ASGD
from .sgd import SGD
from .rprop import Rprop
from .rmsprop import RMSprop
from .optimizer import Optimizer
del adadelta
del adagrad
del adam
del adamax
del asgd
del sgd
del rprop
del rmsprop
del optimizer