torch/optim/__init__.py

"""
:mod:`torch.optim` is a package for optimizing neural networks.
It provides a wide variety of optimization methods such as SGD, Adam etc.

Currently, the following optimization methods are supported, typically with
options such as weight decay and other bells and whistles.

- SGD
- AdaDelta
- Adagrad
- Adam
- AdaMax
- Averaged SGD
- RProp
- RMSProp


The usage of the Optim package itself is as follows.

1. Construct an optimizer
2. Use ``optimizer.step(...)`` to optimize.
   - Call ``optimizer.zero_grad()`` to zero out the gradient buffers when appropriate

Constructing the optimizer
--------------------------

One first constructs an ``Optimizer`` object by giving it a list of parameters
to optimize, as well as the optimizer options,such as learning rate, weight decay, etc.

Examples::

    optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
    optimizer = optim.Adam([var1, var2], lr = 0.0001)

Per-parameter options
---------------------

In a more advanced usage, one can specify per-layer options by passing each parameter group along with it's custom options.

**Any parameter group that does not have an attribute defined will use the default attributes.**

This is very useful when one wants to specify per-layer learning rates for example.

For example such invocation::

    optim.SGD([
        {'params': model1.parameters()},
        {'params': model2.parameters(), 'lr': 1e-3}],
        lr=1e-2, momentum=0.9)

means that

* ``model1``'s parameters will use the default learning rate of ``1e-2`` and momentum of ``0.9``
* ``model2``'s parameters will use a learning rate of ``1e-3``, and the default momentum of ``0.9``

Then, you can use the optimizer by calling `optimizer.zero_grad()` and `optimizer.step(...)`. Read the next sections.

Taking an optimization step using ``step``
-------------------------------------------------------

``optimizer.step()``
^^^^^^^^^^^^^^^^^^^^

This is a simplified version supported by most optimizers.

The function can be called after computing the gradients with ``backward()``.

Example 2 - training a neural network::

    net = MNISTNet()
    criterion = ClassNLLLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001)

    for data in data_batches:
        input, target = data
            optimizer.zero_grad()
            output = net(input)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()

The step function can be used in two ways.

``optimizer.step(closure)``
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``step`` function takes a user-defined closure that computes f(x) and returns the loss.

The closure should look somewhat like this::

    def f_closure(x):
        optimizer.zero_grad()
        loss = f(x)
        loss.backward()
        return loss

Example 1 - training a neural network::

    net = MNISTNet()
    criterion = ClassNLLLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001)

    for data in data_batches:
        input, target = data
            def closure():
                optimizer.zero_grad()
                output = net(input)
                    loss = criterion(output, target)
                    loss.backward()
                    return loss
            optimizer.step(closure)

Note:
    **Why is this supported?**
    Some optimization algorithms such as Conjugate Gradient and LBFGS need to evaluate their function
    multiple times. For such optimization methods, the function (i.e. the closure) has to be defined.
"""

from .adadelta import Adadelta
from .adagrad import Adagrad
from .adam import Adam
from .adamax import Adamax
from .asgd import ASGD
from .sgd import SGD
from .rprop import Rprop
from .rmsprop import RMSprop
from .optimizer import Optimizer

del adadelta
del adagrad
del adam
del adamax
del asgd
del sgd
del rprop
del rmsprop
del optimizer