-
Notifications
You must be signed in to change notification settings - Fork 284
TypeError: optimizers must be either a single optimizer or a list of optimizers. #18
Description
Hello,
I'm trying to run main_swav.py with the following command:
python -m torch.distributed.launch --nproc_per_node=1 main_swav.py --images_path=<path to data directory> --train_annotations_path <path to data file> --epochs 400 --base_lr 0.6 --final_lr 0.0006 --warmup_epochs 0 --batch_size 32 --size_crops 224 96 --nmb_crops 2 6 --min_scale_crops 0.14 0.05 --max_scale_crops 1. 0.14 --use_fp16 true --freeze_prototypes_niters 5005 --queue_length 3840 --epoch_queue_starts 15
Some of those parameters have been added to accommodate our data. The only changes I have made to the code are minor changes to the dataset and additional/changed arguments. When I run this command I get the following error:
`Traceback (most recent call last):
File "main_swav.py", line 380, in
main()
File "main_swav.py", line 189, in main
model, optimizer = apex.amp.initialize(model, optimizer, opt_level="O1")
File "/opt/conda/lib/python3.6/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py", line 158, in _initialize
raise TypeError("optimizers must be either a single optimizer or a list of optimizers.")
TypeError: optimizers must be either a single optimizer or a list of optimizers.
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'main_swav.py', '--local_rank=0', '--images_path=/data/computer_vision_projects/rare_planes/classification_data/images/', '--train_annotations_path', '/data/computer_vision_projects/rare_planes/classification_data/annotations/instances_train_role_mislabel_category_id_033_chipped.json', '--epochs', '400', '--base_lr', '0.6', '--final_lr', '0.0006', '--warmup_epochs', '0', '--batch_size', '32', '--size_crops', '224', '96', '--nmb_crops', '2', '6', '--min_scale_crops', '0.14', '0.05', '--max_scale_crops', '1.', '0.14', '--use_fp16', 'true', '--freeze_prototypes_niters', '5005', '--queue_length', '3840', '--epoch_queue_starts', '15']' returned non-zero exit status 1.
make: *** [Makefile:69: train-rare-planes] Error 1`
Immediately before the line that throws the error I placed a couple print statements:
print("type(OPTIMIZER)", type(optimizer)) print("OPTIMIZER", optimizer)
The output from those is:
type(OPTIMIZER) <class 'apex.parallel.LARC.LARC'> OPTIMIZER SGD ( Parameter Group 0 dampening: 0 lr: 0.6 momentum: 0.9 nesterov: False weight_decay: 1e-06 )
Here are some version numbers I'm using:
Python 3.6.9 :: Anaconda, Inc. PyTorch == 1.5.0a0+8f84ded torchvision == 0.6.0a0 CUDA == 10.2 apex == 0.1
Any ideas why I would be seeing this error? Thanks in advance!