`LightningCLI` doesn't fail when `config.yaml` contains invalid arguments #20337

adosar · 2024-10-11T15:32:24Z

Bug description

I was playing around with the LightningCLI and I found out that it can still work even if the config.yaml contains invalid data types. For example, max_epochs for Trainer should be int. However, it still succeeds with a str in the .yaml. In the MWE, you can see that config.yaml contains str for both seed_everything and max_epochs. This is also evident when reading back the config.yaml file:

import yaml

with open('config.yaml', 'r') as fhand:
    data = yaml.load(fhand)

print(data)

{'seed_everything': '1042', 'trainer': {'max_epochs': '2'}}  # Prints this

Note

I am not sure if this is really a bug, since it might be the case that the LightningCLI converts the given data types to the correct ones based on the type hints. However, I couldn't find if this is really the case.

What version are you seeing the problem on?

v2.4

How to reproduce the bug

# main.py
from lightning.pytorch.cli import LightningCLI

# simple demo classes for your convenience
from lightning.pytorch.demos.boring_classes import DemoModel, BoringDataModule


def cli_main():
    cli = LightningCLI(DemoModel, BoringDataModule)
    # note: don't call fit!!


if __name__ == "__main__":
    cli_main()
    # note: it is good practice to implement the CLI in a function and call it in the main if block

# config.yaml
seed_everything: "1042"

trainer:
  max_epochs: "2"

Now from the CLI:

python main.py fit --config=config.yaml



### Error messages and logs

config.yaml lightning_logs/ main.py
(aidsorb) [ansar@mofinium ligthning_bug]$ python main.py fit --config=config.yaml
Seed set to 1042
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/trainer/configuration_validator.py:68: You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
[rank: 1] Seed set to 1042
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]

| Name | Type | Params | Mode

0 | l1 | Linear | 330 | train

330 Trainable params
0 Non-trainable params
330 Total params
0.001 Total estimated model params size (MB)
1 Modules in train mode
0 Modules in eval mode
/home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=9in theDataLoader to improve performance. /home/ansar/venvir/aidsorb/lib64/python3.11/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (32) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1100.86it/s, v_num=3]Trainer.fitstopped:max_epochs=2` reached.
Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1045.92it/s, v_num



### Environment

<details>
  <summary>Current environment</summary>

* CUDA:
	- GPU:
		- Quadro RTX 4000
		- Quadro RTX 4000
	- available:         True
	- version:           12.1
* Lightning:
	- lightning:         2.4.0
	- lightning-utilities: 0.11.7
	- pytorch-lightning: 2.4.0
	- torch:             2.4.1
	- torchmetrics:      1.4.3
	- torchvision:       0.19.1
* Packages:
	- absl-py:           2.1.0
	- aidsorb:           1.0.0
	- aiohappyeyeballs:  2.4.3
	- aiohttp:           3.10.9
	- aiosignal:         1.3.1
	- ase:               3.23.0
	- attrs:             24.2.0
	- contourpy:         1.3.0
	- cycler:            0.12.1
	- docstring-parser:  0.16
	- filelock:          3.16.1
	- fire:              0.7.0
	- fonttools:         4.54.1
	- frozenlist:        1.4.1
	- fsspec:            2024.9.0
	- grpcio:            1.66.2
	- idna:              3.10
	- importlib-resources: 6.4.5
	- jinja2:            3.1.4
	- jsonargparse:      4.33.2
	- kiwisolver:        1.4.7
	- lightning:         2.4.0
	- lightning-utilities: 0.11.7
	- markdown:          3.7
	- markupsafe:        3.0.1
	- matplotlib:        3.9.2
	- mpmath:            1.3.0
	- multidict:         6.1.0
	- networkx:          3.3
	- numpy:             1.26.4
	- nvidia-cublas-cu12: 12.1.3.1
	- nvidia-cuda-cupti-cu12: 12.1.105
	- nvidia-cuda-nvrtc-cu12: 12.1.105
	- nvidia-cuda-runtime-cu12: 12.1.105
	- nvidia-cudnn-cu12: 9.1.0.70
	- nvidia-cufft-cu12: 11.0.2.54
	- nvidia-curand-cu12: 10.3.2.106
	- nvidia-cusolver-cu12: 11.4.5.107
	- nvidia-cusparse-cu12: 12.1.0.106
	- nvidia-nccl-cu12:  2.20.5
	- nvidia-nvjitlink-cu12: 12.6.77
	- nvidia-nvtx-cu12:  12.1.105
	- packaging:         24.1
	- pandas:            2.2.3
	- pillow:            10.4.0
	- pip:               24.2
	- plotly:            5.24.1
	- propcache:         0.2.0
	- protobuf:          5.28.2
	- pyparsing:         3.1.4
	- python-dateutil:   2.9.0.post0
	- pytorch-lightning: 2.4.0
	- pytz:              2024.2
	- pyyaml:            6.0.2
	- scipy:             1.14.1
	- setuptools:        65.5.1
	- six:               1.16.0
	- sympy:             1.13.3
	- tenacity:          9.0.0
	- tensorboard:       2.18.0
	- tensorboard-data-server: 0.7.2
	- termcolor:         2.5.0
	- torch:             2.4.1
	- torchmetrics:      1.4.3
	- torchvision:       0.19.1
	- tqdm:              4.66.5
	- triton:            3.0.0
	- typeshed-client:   2.7.0
	- typing-extensions: 4.12.2
	- tzdata:            2024.2
	- werkzeug:          3.0.4
	- yarl:              1.14.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.11.7
	- release:           5.14.0-427.16.1.el9_4.x86_64
	- version:           #1 SMP PREEMPT_DYNAMIC Wed May 8 17:48:14 UTC 2024

</details>

### More info

_No response_

The text was updated successfully, but these errors were encountered:

adosar added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Oct 11, 2024

github-actions bot added the ver: 2.4.x label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`LightningCLI` doesn't fail when `config.yaml` contains invalid arguments #20337

`LightningCLI` doesn't fail when `config.yaml` contains invalid arguments #20337

adosar commented Oct 11, 2024 •

edited

Loading

LightningCLI doesn't fail when config.yaml contains invalid arguments #20337

LightningCLI doesn't fail when config.yaml contains invalid arguments #20337

Comments

adosar commented Oct 11, 2024 • edited Loading

Bug description

What version are you seeing the problem on?

How to reproduce the bug

distributed_backend=nccl All distributed processes registered. Starting with 2 processes

| Name | Type | Params | Mode

0 | l1 | Linear | 330 | train

`LightningCLI` doesn't fail when `config.yaml` contains invalid arguments #20337

`LightningCLI` doesn't fail when `config.yaml` contains invalid arguments #20337

adosar commented Oct 11, 2024 •

edited

Loading

distributed_backend=nccl
All distributed processes registered. Starting with 2 processes