Skip to content

Conversation

@clessig
Copy link
Collaborator

@clessig clessig commented Dec 30, 2025

Description

Revise config to nested dict; simplify code where possible and where changes are necessary anyway.

This PR also enables a more flexible combination of different loss terms, e.g. of a physical space and latent loss, as demonstrated in the default config. It also decouples training and validation and test as much as possible, so that one can have different objectives for these.

Issue Number

Closes #1534
Closes #1535

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

…culator and various other details cleaned up
…of config is passed to LRScheduler, which leads to major simplifications
@github-actions github-actions bot added infra Issues related to infrastructure model Related to model training or definition (not generic infra) labels Dec 30, 2025
@clessig
Copy link
Collaborator Author

clessig commented Dec 31, 2025

forecast, ERA5 + NPP-ATMS + SYNOP

num_samples/batch_size = 1: (520709 : g479xfk6)

0: LossPhysical.ERA5.mse.avg : 9.4735E-02
0: LossPhysical.ERA5.mae.avg : 1.9303E-01
0: LossPhysical.NPPATMS.mse.avg : 1.1044E-02
0: LossPhysical.NPPATMS.mae.avg : 2.9273E-02
0: LossPhysical.SurfaceCombined.mse.avg : 2.6745E-01
0: LossPhysical.SurfaceCombined.mae.avg : 2.8849E-01
0: LossPhysical.loss_avg : 1.8611E-01

num_samples/batch_size = 2: 520711 : ypix0nr7

0: LossPhysical.ERA5.mse.avg : 1.1328E-01
0: LossPhysical.ERA5.mae.avg : 2.1473E-01
0: LossPhysical.NPPATMS.mse.avg : 1.8830E-02
0: LossPhysical.NPPATMS.mae.avg : 4.5836E-02
0: LossPhysical.SurfaceCombined.mse.avg : 2.9414E-01
0: LossPhysical.SurfaceCombined.mae.avg : 3.0358E-01
0: LossPhysical.loss_avg : 2.0496E-01

@clessig clessig moved this to In Progress in WeatherGen-dev Dec 31, 2025
This was referenced Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infra Issues related to infrastructure model Related to model training or definition (not generic infra)

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Batch_size (num_samples) > 1 is currently broken Convert model config from flat to hierarchical dict

2 participants