Skip to content

Handling of diagnostic and forcing datasets should happen in masker and not in tokenizer #1682

@clessig

Description

@clessig

What happened?

Handling of diagnostic (no source channels) and forcing (no target channel) datasets is currently done in the tokenizer:

if is_diagnostic or rdata.data.shape[1] == 0 or len(rdata.data) < 2:

is_forcing = stream_info.get("forcing", False)

This leads to an inconsistency between the masks and the actual data, e.g. here:

# preds_batch can be empty and output_info not for forcings

Hence, the handling of forcing and diagnostic datasets should be done in the masker, with empty target and sources masks being generated in these cases, respectively.

CC @wael-mika @shmh40

What are the steps to reproduce the bug?

No response

Hedgedoc link to logs and more information. This ticket is public, do not attach files directly.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodelRelated to model training or definition (not generic infra)

    Type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions