Skip to content

Make training loop and stages explicit? #96

Open
@albertz

Description

@albertz

All the iterative-looking logic currently being implemented, like

y = ...
loss = cross_entropy(...)
loss.mark_as_loss()

this defines what is being calculated per-step. So it assumes one outer loop over the steps.
Actually, you can also see this more as a model definition. What actually is done per-step is the model update in training via the optimizer, which is separated here. Actually the optimizer is also not really part of returnn-common yet.

In any case, now mixing this with logic which is done at initialization (also depending on whether this is a new run, starting in epoch 1, or a continued run, loading a previous checkpoint), and with logic done per-epoch, can maybe cause confusion.

Maybe we should make this logic more explicit, the definition what part the calculation is executed in what context. Maybe similar to our Loop (#16) logic with a context manager.

At initialization, we would also handle parameter initialization (#59), or maybe also custom checkpoint loading (#93).

Some suggestion:

# Create model. This will already set default param init.
model = Model(...)

model.param.init = ...  # overwrite with some custom

with nn.epoch_loop():
  ...

  with nn.step_loop():
    x = nn.extern_data(...)
    y = model(x)
    loss = nn.cross_entropy(y, targets)
    loss.mark_as_loss()

Old suggestion:

model = Model(...)

with nn.init_ctx():
  ...

with nn.step_loop():
  ...

with nn.epoch_loop():
  ...

This however is not so optimal because param init would naturally probably already happen inside the Model constructor. Basically everything before the training loop would be part of the init context. Just like we also have it for Loop.

Also, it does not follow the natural logic that epoch loop is behind step loop. The step loop should be inside the epoch loop.


(This was suggested as part of #93 on the model checkpoint load and store logic.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions