You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello.
First of all, thanks for sharing a bitnet training code.
I have a question about GPU memory usage.
As I understanding, bitnet can reduce VRAM usage compared to fp16/bf16 precision.
However, by commenting code in the train_bitnet.py model = apply_bitlinear(model, target_layers=target_layers) # comment this to train og llama
memory usage is reduced about 2G.
(with bitnet layer, it used 13G v.s. w/o bitnet layer, 11G)
Doesn't it make sense that using bitnet would actually result in lower memory usage?
Thanks.
The text was updated successfully, but these errors were encountered:
Hello.
First of all, thanks for sharing a bitnet training code.
I have a question about GPU memory usage.
As I understanding, bitnet can reduce VRAM usage compared to fp16/bf16 precision.
However, by commenting code in the train_bitnet.py
model = apply_bitlinear(model, target_layers=target_layers) # comment this to train og llama
memory usage is reduced about 2G.
(with bitnet layer, it used 13G v.s. w/o bitnet layer, 11G)
Doesn't it make sense that using bitnet would actually result in lower memory usage?
Thanks.
The text was updated successfully, but these errors were encountered: