Make training loop and stages explicit?

All the iterative-looking logic currently being implemented, like
```
y = ...
loss = cross_entropy(...)
loss.mark_as_loss()
```
this defines what is being calculated **per-step**. So it assumes one outer loop over the steps.
Actually, you can also see this more as a model definition. What actually is done per-step is the model update in training via the optimizer, which is separated here. Actually the optimizer is also not really part of returnn-common yet.

In any case, now mixing this with logic which is done **at initialization** (also depending on whether this is a new run, starting in epoch 1, or a continued run, loading a previous checkpoint), and with logic done **per-epoch**, can maybe cause confusion.

Maybe we should make this logic more explicit, the definition what part the calculation is executed in what context. Maybe similar to our `Loop` (#16) logic with a context manager.

At initialization, we would also handle parameter initialization (#59), or maybe also custom checkpoint loading (#93).

Some suggestion:

```
# Create model. This will already set default param init.
model = Model(...)

model.param.init = ...  # overwrite with some custom

with nn.epoch_loop():
  ...

  with nn.step_loop():
    x = nn.extern_data(...)
    y = model(x)
    loss = nn.cross_entropy(y, targets)
    loss.mark_as_loss()
```

---

Old suggestion:

```
model = Model(...)

with nn.init_ctx():
  ...

with nn.step_loop():
  ...

with nn.epoch_loop():
  ...
```

This however is not so optimal because param init would naturally probably already happen inside the `Model` constructor. Basically everything before the training loop would be part of the init context. Just like we also have it for `Loop`.

Also, it does not follow the natural logic that epoch loop is behind step loop. The step loop should be inside the epoch loop.

---

*(This was suggested as part of #93 on the model checkpoint load and store logic.)*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make training loop and stages explicit? #96

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make training loop and stages explicit? #96

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions