出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 #134

ZeyuTeng96 · 2023-04-09T07:55:25Z

您好，在使用finetune脚本使用指令微调数据集微调bloom-7b模型时前几个step出现：

tried to get lr value before scheduler/optimizer started stepping, returning lr=0

这个warning是什么原因呢？

bloom config为:
{
"model_type": "bloom",
"model_name_or_path": "bigscience/bloomz-7b1-mt",
"data_path": "data/res/merge_data.json",
"output_dir": "trained_models/bloom",
"per_device_train_batch_size": 1,
"num_epochs": 2,
"learning_rate": 1e-5,
"cutoff_len": 1024,
"val_set_size": 1000,
"val_set_rate": 0.1,
"save_steps": 1000,
"eval_steps": 1000,
"logging_steps": 1,
"gradient_accumulation_steps": 32
}

deepspeed config为：
{
"train_batch_size": "auto",

"optimizer": {
  "type": "Adam",
  "params": {
    "lr": "auto",
    "betas": [
      0.9,
      0.999
    ],
    "eps": "auto",
    "weight_decay": "auto"
  }
},

"overwrite":true,
"gradient_accumulation_steps": "auto",
"fp16": {
"enabled": true,
"min_loss_scale": 1,
"opt_level": "O2"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
},

"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
}
}

The text was updated successfully, but these errors were encountered:

xianghuisun · 2023-04-09T10:29:43Z

如果把logging_steps改为10以上呢？

ZeyuTeng96 · 2023-04-09T10:44:00Z

如果把logging_steps改为10以上呢？

10以上是肯定会有的，但是问题是bloom config里设置了"gradient_accumulation_steps": 32，意味着每一步的logging都是经历了32个batch，如果这样的话前几个steps没有学习率的话，多少有点不对劲呢

ZeyuTeng96 · 2023-04-09T10:45:44Z

如果把logging_steps改为10以上呢？

有在transformers的issue里面看过类似的，貌似说法是deepspeed config里设置lr、optimizer的问题导致，还有说法是模型之前是bf16,但是现在设置的fp16?

issue如下：
huggingface/transformers#14531

ZeyuTeng96 · 2023-04-09T10:49:19Z

如果把logging_steps改为10以上呢？

请问，如果按照官方的bloom config和deepspeed config运行的话，会出现lr = 0的问题嘛？

xianghuisun · 2023-04-09T10:54:31Z

igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps": 32 }

deepspeed config为： { "train_batch_size": "auto

您实验的机器是A100嘛，我们实验时并没有遇到lr=0的问题

ZeyuTeng96 · 2023-04-09T11:08:41Z

igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps": 32 }
deepspeed config为： { "train_batch_size": "auto

您实验的机器是A100嘛，我们实验时并没有遇到lr=0的问题

是80G的A100，如果设置logging step 1的话，会出现这种情况嘛？

xianghuisun · 2023-04-09T11:10:38Z

我们会找时间尝试一下，看看能不能复现这个问题。

…

------------------ 原始邮件 ------------------ 发件人: "LianjiaTech/BELLE" ***@***.***>; 发送时间: 2023年4月9日(星期天) 晚上7:08 ***@***.***>; ***@***.******@***.***>; 主题: Re: [LianjiaTech/BELLE] 出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 (Issue #134) igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps": 32 } deepspeed config为： { "train_batch_size": "auto 您实验的机器是A100嘛，我们实验时并没有遇到lr=0的问题是80G的A100，如果设置logging step 1的话，会出现这种情况嘛？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

ZeyuTeng96 · 2023-04-09T11:12:18Z

我们会找时间尝试一下，看看能不能复现这个问题。
…
------------------ 原始邮件 ------------------ 发件人: "LianjiaTech/BELLE" @.>; 发送时间: 2023年4月9日(星期天) 晚上7:08 @.>; @.@.>; 主题: Re: [LianjiaTech/BELLE] 出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 (Issue #134) igscience/bloomz-7b1-mt", "data_path": "data/res/merge_data.json", "output_dir": "trained_models/bloom", "per_device_train_batch_size": 1, "num_epochs": 2, "learning_rate": 1e-5, "cutoff_len": 1024, "val_set_size": 1000, "val_set_rate": 0.1, "save_steps": 1000, "eval_steps": 1000, "logging_steps": 1, "gradient_accumulation_steps": 32 } deepspeed config为： { "train_batch_size": "auto 您实验的机器是A100嘛，我们实验时并没有遇到lr=0的问题是80G的A100，如果设置logging step 1的话，会出现这种情况嘛？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

好嘞，期待反馈

ZeyuTeng96 · 2023-04-10T02:45:54Z

您好，做了如下的实现，其中bloom config为：
{
"model_type": "bloom",
"model_name_or_path": "bigscience/bloom-1b1",
"data_path": "data/trans_1.json",
"output_dir": "trained_models/bloom",
"per_device_train_batch_size": 1,
"num_epochs": 2,
"learning_rate": 1e-5,
"cutoff_len": 1024,
"val_set_size": 1000,
"val_set_rate": 0.1,
"save_steps": 1000,
"eval_steps": 1000,
"logging_steps": 1,
"gradient_accumulation_steps": 32
}

deepspeed config #1 的配置为：
{
"train_batch_size": "auto",
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"overwrite":true,
"gradient_accumulation_steps": "auto",
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
}
}

deepspeed config #2 的配置为（接近官方提供的配置）：
{
"train_batch_size": "auto",

"optimizer": {
  "type": "Adam",
  "params": {
    "lr": "auto",
    "betas": [
      0.9,
      0.999
    ],
    "eps": "auto",
    "weight_decay": "auto"
  }
},

"overwrite":true,
"gradient_accumulation_steps": "auto",
"fp16": {
"enabled": true,
"min_loss_scale": 1,
"opt_level": "O2"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
},

"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
}
}

ZeyuTeng96 · 2023-04-10T02:47:13Z

在单纯使用1b1模型，不使用deepspeed进行微调时，学习率变化如下：
{'loss': 2.6999, 'learning_rate': 5.263157894736843e-07, 'epoch': 0.01}
{'loss': 2.7946, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.02}
{'loss': 3.1472, 'learning_rate': 1.5789473684210526e-06, 'epoch': 0.03}
{'loss': 2.7722, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.04}
{'loss': 2.9574, 'learning_rate': 2.631578947368421e-06, 'epoch': 0.05}
{'loss': 2.7037, 'learning_rate': 2.631578947368421e-06, 'epoch': 0.07}
{'loss': 2.9451, 'learning_rate': 2.631578947368421e-06, 'epoch': 0.08}
{'loss': 2.8337, 'learning_rate': 3.157894736842105e-06, 'epoch': 0.09}
{'loss': 2.9723, 'learning_rate': 3.6842105263157896e-06, 'epoch': 0.1}
{'loss': 3.008, 'learning_rate': 4.210526315789474e-06, 'epoch': 0.11}
{'loss': 3.0198, 'learning_rate': 4.736842105263158e-06, 'epoch': 0.12}
{'loss': 2.9892, 'learning_rate': 5.263157894736842e-06, 'epoch': 0.13}
{'loss': 2.4021, 'learning_rate': 5.789473684210527e-06, 'epoch': 0.14}
{'loss': 2.344, 'learning_rate': 5.789473684210527e-06, 'epoch': 0.15}
{'loss': 2.4769, 'learning_rate': 6.31578947368421e-06, 'epoch': 0.16}
{'loss': 2.2217, 'learning_rate': 6.842105263157896e-06, 'epoch': 0.18}
{'loss': 2.4098, 'learning_rate': 6.842105263157896e-06, 'epoch': 0.19}
{'loss': 1.9803, 'learning_rate': 7.368421052631579e-06, 'epoch': 0.2}
{'loss': 2.1771, 'learning_rate': 7.894736842105265e-06, 'epoch': 0.21}
{'loss': 2.4345, 'learning_rate': 8.421052631578948e-06, 'epoch': 0.22}
{'loss': 2.4525, 'learning_rate': 8.947368421052632e-06, 'epoch': 0.23}
{'loss': 2.585, 'learning_rate': 9.473684210526315e-06, 'epoch': 0.24}
{'loss': 2.7307, 'learning_rate': 1e-05, 'epoch': 0.25}

ZeyuTeng96 · 2023-04-10T02:48:09Z

在使用deepspeed config 1 的配置时，学习率变化如下：
{'loss': 2.8091, 'learning_rate': 5.263157894736843e-07, 'epoch': 0.01}
{'loss': 2.8488, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.02}
{'loss': 2.9292, 'learning_rate': 1.5789473684210526e-06, 'epoch': 0.03}
{'loss': 2.8395, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.04}
{'loss': 3.1188, 'learning_rate': 2.631578947368421e-06, 'epoch': 0.05}
{'loss': 2.9179, 'learning_rate': 3.157894736842105e-06, 'epoch': 0.07}
{'loss': 2.8102, 'learning_rate': 3.6842105263157896e-06, 'epoch': 0.08}
{'loss': 2.8484, 'learning_rate': 4.210526315789474e-06, 'epoch': 0.09}
{'loss': 2.9805, 'learning_rate': 4.736842105263158e-06, 'epoch': 0.1}
{'loss': 2.7548, 'learning_rate': 5.263157894736842e-06, 'epoch': 0.11}
{'loss': 2.6809, 'learning_rate': 5.789473684210527e-06, 'epoch': 0.12}
{'loss': 2.5852, 'learning_rate': 6.31578947368421e-06, 'epoch': 0.13}
{'loss': 2.6456, 'learning_rate': 6.842105263157896e-06, 'epoch': 0.14}
{'loss': 2.6222, 'learning_rate': 7.368421052631579e-06, 'epoch': 0.15}
{'loss': 2.2331, 'learning_rate': 7.894736842105265e-06, 'epoch': 0.16}
{'loss': 2.2346, 'learning_rate': 8.421052631578948e-06, 'epoch': 0.18}
{'loss': 1.9481, 'learning_rate': 8.947368421052632e-06, 'epoch': 0.19}
{'loss': 1.98, 'learning_rate': 9.473684210526315e-06, 'epoch': 0.2}
{'loss': 2.2987, 'learning_rate': 1e-05, 'epoch': 0.21}

ZeyuTeng96 · 2023-04-10T02:48:50Z

在使用deepspeed config 2 的配置时，学习率变化如下：
tried to get lr value before scheduler/optimizer started stepping, returning lr=0
{'loss': 2.7112, 'learning_rate': 0, 'epoch': 0.01}
1%|▉ | 2/182 [01:05<1:37:41, 32.56s/it]tried to get lr value before scheduler/optimizer started stepping, returning lr=0
{'loss': 2.9341, 'learning_rate': 0, 'epoch': 0.02}
2%|█▎ | 3/182 [01:37<1:36:59, 32.51s/it]tried to get lr value before scheduler/optimizer started stepping, returning lr=0
{'loss': 3.093, 'learning_rate': 0, 'epoch': 0.03}
{'loss': 2.9688, 'learning_rate': 0.0, 'epoch': 0.04}
{'loss': 2.9455, 'learning_rate': 2.3540891336663827e-06, 'epoch': 0.05}
{'loss': 3.0102, 'learning_rate': 2.3540891336663827e-06, 'epoch': 0.07}
{'loss': 3.1245, 'learning_rate': 3.73114300021637e-06, 'epoch': 0.08}
{'loss': 2.8258, 'learning_rate': 4.7081782673327655e-06, 'epoch': 0.09}
{'loss': 2.9814, 'learning_rate': 5.466025697329025e-06, 'epoch': 0.1}
{'loss': 2.5915, 'learning_rate': 5.466025697329025e-06, 'epoch': 0.11}
{'loss': 2.8165, 'learning_rate': 6.0852321338827525e-06, 'epoch': 0.12}
{'loss': 2.6727, 'learning_rate': 6.60876371636064e-06, 'epoch': 0.13}
{'loss': 2.7603, 'learning_rate': 7.062267400999148e-06, 'epoch': 0.14}
{'loss': 2.0928, 'learning_rate': 7.46228600043274e-06, 'epoch': 0.15}
{'loss': 2.4763, 'learning_rate': 7.820114830995408e-06, 'epoch': 0.16}
{'loss': 2.2755, 'learning_rate': 8.143810382095967e-06, 'epoch': 0.18}
{'loss': 2.07, 'learning_rate': 8.439321267549136e-06, 'epoch': 0.19}
{'loss': 1.9242, 'learning_rate': 8.711164930263437e-06, 'epoch': 0.2}
{'loss': 2.0989, 'learning_rate': 8.962852850027021e-06, 'epoch': 0.21}
{'loss': 1.9225, 'learning_rate': 9.197168697545394e-06, 'epoch': 0.22}
{'loss': 1.766, 'learning_rate': 9.416356534665531e-06, 'epoch': 0.23}
{'loss': 2.6338, 'learning_rate': 9.416356534665531e-06, 'epoch': 0.24}
{'loss': 2.7871, 'learning_rate': 9.622251858852542e-06, 'epoch': 0.25}
{'loss': 3.1649, 'learning_rate': 9.816375134099122e-06, 'epoch': 0.26}
{'loss': 2.9512, 'learning_rate': 1e-05, 'epoch': 0.27}

ZeyuTeng96 · 2023-04-10T02:50:52Z

看了一下trainer的default优化器貌似是adamw，但是官方提供的deepspeed配置文件里的优化器type为adam。其次，deepspeed配置文件里如果加入fp16和lr scheduler的话，就会存在前几个step学习率为0的情况。 @xianghuisun

ZeyuTeng96 · 2023-04-11T06:23:28Z

还请问您们一下，如果使用bloom做指令微调的话，是需要对bloom-7b1的模型的词表进行扩充嘛？ @xianghuisun

wind91725 · 2023-04-11T07:59:48Z

这学习率越学越大？

ZeyuTeng96 · 2023-04-11T08:02:46Z

这学习率越学越大？

warmup_lr啊

hao-xyz · 2023-04-12T08:16:34Z

这学习率越学越大？

warmup_lr啊

大佬，我是一直lr是0，这个会是什么原因导致的？一直有这个warning

ZeyuTeng96 · 2023-04-12T08:20:59Z

这学习率越学越大？

warmup_lr啊

大佬，我是一直lr是0，这个会是什么原因导致的？一直有这个warning

把deepspeed的config里面fp16和lr scheduler配置去掉，optimizer改adamw试试，按照我的配置试试

hao-xyz · 2023-04-12T08:27:42Z

这学习率越学越大？

warmup_lr啊

大佬，我是一直lr是0，这个会是什么原因导致的？一直有这个warning

把deepspeed的config里面fp16和lr scheduler配置去掉，optimizer改adamw试试，按照我的配置试试

这些配置试过了，会有同样的问题，我甚至没有开warmup, 用的bf16，多机多卡，目前的问题是，不确定到底多少个steps lr能够跳出0，有的时候很快就跳出0了，有的时候要几百个steps，有的时候就一直不跳出0。而且之前finetune其他模型没有遇到过这个问题.... 会不会是硬件或者环境有问题

ZeyuTeng96 · 2023-04-13T07:55:32Z

这学习率越学越大？

warmup_lr啊

大佬，我是一直lr是0，这个会是什么原因导致的？一直有这个warning

把deepspeed的config里面fp16和lr scheduler配置去掉，optimizer改adamw试试，按照我的配置试试

这些配置试过了，会有同样的问题，我甚至没有开warmup, 用的bf16，多机多卡，目前的问题是，不确定到底多少个steps lr能够跳出0，有的时候很快就跳出0了，有的时候要几百个steps，有的时候就一直不跳出0。而且之前finetune其他模型没有遇到过这个问题.... 会不会是硬件或者环境有问题

感觉不是硬件或者环境问题，我这个issue里面贴了一个transformers的issues。出现这种问题有可能是bloom这个模型在预训练的时候用的参数导致。可能是这种情况，我也不是很确定，希望官方有空能验证一下，找出问题

hao-xyz · 2023-04-13T08:13:09Z

这学习率越学越大？

warmup_lr啊

大佬，我是一直lr是0，这个会是什么原因导致的？一直有这个warning

把deepspeed的config里面fp16和lr scheduler配置去掉，optimizer改adamw试试，按照我的配置试试

这些配置试过了，会有同样的问题，我甚至没有开warmup, 用的bf16，多机多卡，目前的问题是，不确定到底多少个steps lr能够跳出0，有的时候很快就跳出0了，有的时候要几百个steps，有的时候就一直不跳出0。而且之前finetune其他模型没有遇到过这个问题.... 会不会是硬件或者环境有问题

感觉不是硬件或者环境问题，我这个issue里面贴了一个transformers的issues。出现这种问题有可能是bloom这个模型在预训练的时候用的参数导致。可能是这种情况，我也不是很确定，希望官方有空能验证一下，找出问题

我这边用的llama，也是这个问题。
huggingface 报这个warning的地方的说明，但是我用的bf16，zero2也是报这个warning
# with deepspeed's fp16 and dynamic loss scale enabled the optimizer/scheduler steps may
# not run for the first few dozen steps while loss scale is too large, and thus during
# that time get_last_lr will fail if called during that warm up stage, so work around it:

ZeyuTeng96 · 2023-04-17T08:22:44Z

有解决方案嘛？兄弟 @HalcyonLiang

hao-xyz · 2023-04-17T10:02:28Z

有解决方案嘛？兄弟 @HalcyonLiang

我没探究根本原因，只是对比了下不同的配置，用其他配置代替了避免了这个问题
7B 8张A100不用开zero就能训练，没有这个问题，
7B 16张A100 zero2 不开optimizor offload 没有这个问题
13B 16张A100 zero3 不开optimizor和params的offload 没有这个问题
13B 24张A100 zero2 不开optimizor offload 存在有这个问题（显像看是多卡分割gradient的时候，显存占用差的有些多，要等分配差不多均匀后，LR才会开始逐渐开始warmup的过程）
有时间的话，可以再多测试下，供参考

ZeyuTeng96 · 2023-04-17T10:05:00Z

您显卡是真的多，牛逼

WalterSumbon · 2023-06-20T10:59:11Z

我最近在用peft lora微调llama-7b-hf的时候也遇到了这个问题，最后发现是库版本的问题，把transformers降级到4.28.0，deepspeed降级到0.8.3就解决了。

Neo-Zhangjiajie · 2023-08-27T03:08:22Z

我最近在用peft lora微调llama-7b-hf的时候也遇到了这个问题，最后发现是库版本的问题，把transformers降级到4.28.0，deepspeed降级到0.8.3就解决了。

谢谢了！我用你的方法成功了！

chenhuixi-1995 · 2023-12-14T15:27:58Z

Perhaps the batch size is set so large that it lead to “CUDA out of memory”, but the program does not report an error. Try to make the ”train_micro_batch_size_per_gpu“ parameter smaller, Here's what I tried:

train_micro_batch_size_per_gpu = 4
gradient_accumulation_steps = 1
it failed，returning lr=0

train_micro_batch_size_per_gpu = 1
gradient_accumulation_steps = 4
it worked

Lui-16 · 2024-02-25T05:56:32Z

我最近在用peft lora微调llama-7b-hf的时候也遇到了这个问题，最后发现是库版本的问题，把transformers降级到4.28.0，deepspeed降级到0.8.3就解决了。

感谢，我把transformers降级到4.28.0，deepspeed保持在0.12.6，也解决了这个问题

ZyangLee · 2024-04-22T10:50:59Z

我是在使用deepspeed微调flant5系列模型时遇到的该问题，lr一直为0，上述方法只有对Transformers版本降级有效，且deepspeed不需要降级；transformers==4.40 --> 4.28.1, deepspeed=0.9.3

shihanmax · 2024-09-26T09:28:39Z

不知是 feature 还是 bug [/doge]

https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_pt_utils.py#+L912

def _get_learning_rate(self):
    if self.is_deepspeed_enabled:
        # with deepspeed's fp16 and dynamic loss scale enabled the optimizer/scheduler steps may
        # not run for the first few dozen steps while loss scale is too large, and thus during
        # that time `get_last_lr` will fail if called during that warm up stage, so work around it:
        try:
            last_lr = self.lr_scheduler.get_last_lr()[0]
        except AssertionError as e:
            if "need to call step" in str(e):
                logger.warning("tried to get lr value before scheduler/optimizer started stepping, returning lr=0")
                last_lr = 0
            else:
                raise
    else:
        if isinstance(self.lr_scheduler, torch.optim.lr_scheduler.ReduceLROnPlateau):
            last_lr = self.optimizer.param_groups[0]["lr"]
        else:
            last_lr = self.lr_scheduler.get_last_lr()[0]
        if torch.is_tensor(last_lr):
            last_lr = last_lr.item()
    return last_lr

mmmans mentioned this issue Sep 21, 2023

使用最新版belle对baichuan2-7b进行sft，lr为0 baichuan-inc/Baichuan2#161

Open

HaozheZhao mentioned this issue Nov 7, 2023

finetune problem HaozheZhao/MIC#22

Open

fanbooo mentioned this issue Feb 22, 2024

沿用qwen1的lora微调脚本，训练有问题； QwenLM/Qwen2.5#73

Closed

staoxiao mentioned this issue May 29, 2024

I get the error "tried to get lr value before scheduler/optimizer started stepping, returning lr=0" FlagOpen/FlagEmbedding#831

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 #134

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 #134

ZeyuTeng96 commented Apr 9, 2023

xianghuisun commented Apr 9, 2023 •

edited

Loading

ZeyuTeng96 commented Apr 9, 2023

ZeyuTeng96 commented Apr 9, 2023

ZeyuTeng96 commented Apr 9, 2023

xianghuisun commented Apr 9, 2023

ZeyuTeng96 commented Apr 9, 2023

xianghuisun commented Apr 9, 2023 via email

ZeyuTeng96 commented Apr 9, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 11, 2023

wind91725 commented Apr 11, 2023

ZeyuTeng96 commented Apr 11, 2023

hao-xyz commented Apr 12, 2023

ZeyuTeng96 commented Apr 12, 2023

hao-xyz commented Apr 12, 2023

ZeyuTeng96 commented Apr 13, 2023

hao-xyz commented Apr 13, 2023 •

edited

Loading

ZeyuTeng96 commented Apr 17, 2023

hao-xyz commented Apr 17, 2023

ZeyuTeng96 commented Apr 17, 2023

WalterSumbon commented Jun 20, 2023

Neo-Zhangjiajie commented Aug 27, 2023

chenhuixi-1995 commented Dec 14, 2023

Lui-16 commented Feb 25, 2024

ZyangLee commented Apr 22, 2024

shihanmax commented Sep 26, 2024

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 #134

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0 #134

Comments

ZeyuTeng96 commented Apr 9, 2023

xianghuisun commented Apr 9, 2023 • edited Loading

ZeyuTeng96 commented Apr 9, 2023

ZeyuTeng96 commented Apr 9, 2023

ZeyuTeng96 commented Apr 9, 2023

xianghuisun commented Apr 9, 2023

ZeyuTeng96 commented Apr 9, 2023

xianghuisun commented Apr 9, 2023 via email

ZeyuTeng96 commented Apr 9, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 10, 2023

ZeyuTeng96 commented Apr 11, 2023

wind91725 commented Apr 11, 2023

ZeyuTeng96 commented Apr 11, 2023

hao-xyz commented Apr 12, 2023

ZeyuTeng96 commented Apr 12, 2023

hao-xyz commented Apr 12, 2023

ZeyuTeng96 commented Apr 13, 2023

hao-xyz commented Apr 13, 2023 • edited Loading

ZeyuTeng96 commented Apr 17, 2023

hao-xyz commented Apr 17, 2023

ZeyuTeng96 commented Apr 17, 2023

WalterSumbon commented Jun 20, 2023

Neo-Zhangjiajie commented Aug 27, 2023

chenhuixi-1995 commented Dec 14, 2023

Lui-16 commented Feb 25, 2024

ZyangLee commented Apr 22, 2024

shihanmax commented Sep 26, 2024

xianghuisun commented Apr 9, 2023 •

edited

Loading

hao-xyz commented Apr 13, 2023 •

edited

Loading