LightningCLI
doesn't fail when config.yaml
contains invalid arguments
#20337
Labels
LightningCLI
doesn't fail when config.yaml
contains invalid arguments
#20337
Bug description
I was playing around with the
LightningCLI
and I found out that it can still work even if theconfig.yaml
contains invalid data types. For example,max_epochs
forTrainer
should beint
. However, it still succeeds with astr
in the.yaml
. In the MWE, you can see thatconfig.yaml
containsstr
for bothseed_everything
andmax_epochs
. This is also evident when reading back theconfig.yaml
file:Note
I am not sure if this is really a bug, since it might be the case that the
LightningCLI
converts the given data types to the correct ones based on the type hints. However, I couldn't find if this is really the case.What version are you seeing the problem on?
v2.4
How to reproduce the bug
Now from the CLI:
config.yaml lightning_logs/ main.py
(aidsorb) [ansar@mofinium ligthning_bug]$ python main.py fit --config=config.yaml
Seed set to 1042
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/trainer/configuration_validator.py:68: You passed in a
val_dataloader
but have novalidation_step
. Skipping val loop.Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
[rank: 1] Seed set to 1042
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
| Name | Type | Params | Mode
0 | l1 | Linear | 330 | train
330 Trainable params
0 Non-trainable params
330 Total params
0.001 Total estimated model params size (MB)
1 Modules in train mode
0 Modules in eval mode
/home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the
num_workers
argumentto
num_workers=9in the
DataLoaderto improve performance. /home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (32) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1100.86it/s, v_num=3]
Trainer.fitstopped:
max_epochs=2` reached.Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1045.92it/s, v_num
The text was updated successfully, but these errors were encountered: