Refactor TrainLogger

### Is your feature request related to a problem? Please describe.

The current logging system for training metrics has two critical shortcomings:

- Lack of granular logging: It is currently impossible to log loss function values for individual channels within each data stream. This limits visibility into model performance at a per-channel level, hindering detailed analysis.

- Fragile and unreadable format: Metrics are stored in unstructured plain-text (txt) files, which are not human-readable and lack a consistent schema. Minor changes to the order of logged metrics or their contents break backward compatibility, making post-processing, visualization, or comparison across runs error-prone and unmaintainable.

### Describe the solution you'd like

Replace the current plain-text logging format with CSV files using multi-level column headers to organize metrics hierarchically.

This structure would:

- Separate global training statistics (e.g., epoch, total loss) from per-stream and per-channel metrics.

- Ensure human readability while maintaining machine-readability.

- Prevent compatibility breaks due to explicit column naming and hierarchical organization.

Example: 
```
global ,global                     ,global  ,global   ,global   ,global                 ,global              ,FESOM               ,FESOM                ,FESOM               ,FESOM               ,FESOM               ,FESOM              ,FESOM              ,FESOM                 ,FESOM                 ,FESOM                 ,FESOM               ,FESOM               ,FESOM              ,FESOM
step   ,time                       ,samples ,perf_gpu ,perf_mem ,learning_rate          ,loss_mean           ,mse                 ,a_ice                ,evap                ,fh                  ,fw                  ,prec               ,snow               ,ssh                   ,sss                   ,sst                   ,swr                 ,tx_sur              ,ty_sur             ,std
1      ,2025-02-27 14:38:31.542443 ,320     ,98.5     ,48.75    ,2.9802549459833396e-06 ,0.9765826463699341  ,0.9765826463699341  ,0.9043928980827332   ,0.9301251173019409  ,0.972670316696167   ,0.936567485332489   ,0.9416562914848328 ,1.0318517684936523 ,1.0079926252365112    ,1.0319256782531738    ,0.941299319267273     ,0.9138585329055786  ,1.0519397258758545  ,1.054713249206543  ,
```

This solution was implemented in my fork: https://github.com/kacpnowak/WeatherGenerator2/pull/3

### Describe alternatives you've considered

An alternative solution is to use an embedded database (e.g., DuckDB or SQLite) to store metrics in structured tables. Benefits include:

- Support for complex queries across multiple training runs.
- Native schema enforcement, eliminating fragility caused by format changes.
- Efficient storage and retrieval of large-scale experiments.
    
However, CSV files provide a simpler, more accessible intermediate solution that meets immediate needs without introducing database dependencies. Either approach would be preferable to the current system, which is unmaintainable and error-prone due to its reliance on unstructured text.

### Additional context

The current logging implementation’s lack of structure and flexibility actively impedes debugging, analysis, and iterative improvements. A structured format (CSV or database) is critical for scaling experimentation and ensuring reproducibility.

### Organisation

AWI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor TrainLogger #29

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor TrainLogger #29

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions