Skip to content

args.learning_rate is getting set to none on flux_train.py which specifically seems to cause adafactor to break since it can't set initial_lr to learning_rate #3110

Open
@oceanus52

Description

@oceanus52

Something is setting the learning_rate to none here in flux_train.py
I added a few debug loggers to help me locate the problem and its happening somewhere in this block around line 333 of flux_train.py.
if args.blockwise_fused_optimizers:

... something in here

logger.info("args.blockwise before learn_rate = " + str(args))
# prepare dataloader
# strategies are set here because they cannot be referenced in another process. Copy them with the dataset
# some strategies can be None
train_dataset_group.set_current_strategies()

somewhere in here between the if args.blockwise_fused optimizer at around line 333 (on mine) and around line 390 before preparing dataloader.

something in there sets the args.learning_rate to none and then later on when adafactor sets the inital_lr its throwing errors because the learning rate argument is no longer set.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions