[BUG] overflow warning needs to be different for fp16 and non-fp16 #2911
Description
Describe the bug
This code has an issue when it is run under non-fp16 regime.
DeepSpeed/deepspeed/runtime/zero/stage3.py
Lines 1837 to 1842 in da84e60
There are no scalers under bf16/fp32. So this warning is alarming to see - we rushed to see if somehow the config was broken, but it wasn't.
It should only say the Attempted loss scale:...
part under fp16.
Most likely the same applies to its counterpart in stage 1/2.
Also do you think it'd be helpful to tell the user specifically if it's Inf vs. NaN? Since NaN isn't really an overflow or does it? Perhaps one of you with a more rigorous math background knows better. I think overflow is one of many types of NaN, thus NaN isn't always on Overflow. Please correct me if I'm wrong.
The reason I'm asking this question is to help the user to know what to look for, NaNs, Infinity, else.
Activity