-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finetune: Add -ngl parameter #3762
Conversation
When I tried CUDA offloading during finetuning following the readme, I got an assert here. This probably isn't an important case because inference later gives a warning saying you should use f16 or f32 instead when using lora
"error: operator (): Finetuning on tensors with type 'f16' is not yet supported"
The |
I believe it did work for me (with these changes) for f32, with an f16 base model. The result was something I couldn't directly use with --lora and GPU inference, but I could use it with --lora and CPU inference, and it did work (I verified that it had finetuned). The only way your statement could be true, afaik, is if 'finetune' didn't actually offload to GPU when it told me it did (I offloaded 25 out of 29 layers). |
@ggerganov, What I can say is that it schedules work on the GPU, uses a bunch of VRAM, and when I was developing, it hit not-implemented codepaths (hence the code I added). In the console output it says it's using GPU, and it runs faster (though I don't know if it's doing the same amount of work - it produces a lora that, though working, is different somehow in format than produced by the CPU run). Are you sure GPU offloading isn't supported? It seems hard to believe that it can do all this AND do the 'real' work on the CPU AND produce a different result AND run faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work.
Ah, interesting. I haven't realized that we load the base model with the standard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use spaces instead of tabs and we can merge
…nto finetune_enableGpu
I don't know why it still thinks there are conflicts in ggml.c - I merged from master. And when I click 'Resolve conflicts' I get a weird information-devoid view of ggml.c (which looks the same as before I merged from master). |
It looks like you merged master from your repo and not from this one. edit: merged. |
…finetune_enableGpu
* Add '-ngl' support to finetune.cpp * Add fprintf in ggml_cuda_op_add When I tried CUDA offloading during finetuning following the readme, I got an assert here. This probably isn't an important case because inference later gives a warning saying you should use f16 or f32 instead when using lora * Add 'finetune.sh', which currently fails when using GPU "error: operator (): Finetuning on tensors with type 'f16' is not yet supported" * tweak finetune.sh * Suppress some warnings in ggml.c * Add f16 implementation to ggml_compute_forward_add_f16_f32 * Add an f16 case to ggml_add_cast_impl and llama_build_lora_finetune_graphs * finetune.sh: Edit comments * Add "add_f16_f32_f32_cuda" * Tweak an error message * finetune.sh: Add an optional LLAMA_MODEL_DIR variable * finetune.sh: Add an optional LLAMA_TRAINING_DIR variable * train : minor * tabs to spaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Fix #3458