args.learning_rate is getting set to none on flux_train.py which specifically seems to cause adafactor to break since it can't set initial_lr to learning_rate

Something is setting the learning_rate to none here in flux_train.py 
I added a few debug loggers to help me locate the problem and its happening somewhere in this block around line 333 of flux_train.py.  
if args.blockwise_fused_optimizers:        

... something in here 

 logger.info("args.blockwise before learn_rate = " + str(args))
    # prepare dataloader
    # strategies are set here because they cannot be referenced in another process. Copy them with the dataset
    # some strategies can be None
    train_dataset_group.set_current_strategies()            

somewhere in here between the if args.blockwise_fused optimizer at around line 333 (on mine) and around line 390    before preparing dataloader.

something in there sets the args.learning_rate to none and then later on when adafactor sets the inital_lr its throwing errors because the learning rate argument is no longer set.                                                                           

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

args.learning_rate is getting set to none on flux_train.py which specifically seems to cause adafactor to break since it can't set initial_lr to learning_rate #3110

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

args.learning_rate is getting set to none on flux_train.py which specifically seems to cause adafactor to break since it can't set initial_lr to learning_rate #3110

Description

Activity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions