-
-
Notifications
You must be signed in to change notification settings - Fork 238
Adding full weight finetuning #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding full weight finetuning #499
Conversation
|
I think it should be wiser to first merge this then slowly add the other parts grpo, dpo, what do you think @Blaizzy? |
|
was finale able to train the first model for 100 steps: |
|
usualy it crashed OOM after 10 steps :D |
|
You can now train the vision part too and 4bit quant training works as well only wen 2 though qwen2.5 gives off a nan loss: python -m mlx_vlm.lora |
|
so updated some stuff and its a Lott faster and uses a lot less ram: Iter 29: Train loss 8.377, Learning Rate 1.000e-04, It/sec 7.091, Tokens/sec 226.918, Trained Tokens 928, Peak mem 1.839 GB |
|
data from step 40 before: It/sec 1.851, Peak mem 7.643 GB |
|
Qwen 2 models work, both quant and full on both lora and full weight training. |
|
Qwen2.5 is added too |
|
@Blaizzy here is a test adapter llm https://huggingface.co/Goekdeniz-Guelmez/MLX-VLM-Qwen2-VL-2B-Instruct-bf16-VisualWebInstruct-lora/blob/main/README.md the comand used here is: python -m mlx_vlm.lora --model-path mlx-community/Qwen2-VL-2B-Instruct-bf16 --dataset TIGER-Lab/VisualWebInstruct --dataset-config 'example' --output-path Desktop/Qwen2-VL-2B-Instruct-bf16-VisualWebInstruct-lora --batch-size 1 --epochs 1 --learning-rate 1e-6 --grad-checkpoint --train-on-completions --steps-per-report 1 @Blaizzy would be great if you can try out a larger model using this command. |
|
#187 Deepseek vl 1 has been added. |
|
@Goekdeniz-Guelmez and @Blaizzy : Any update on this please? |
|
@sachinraja13 I will be continuing woking on it later this week, after finishing all the Gabliteration project todo's. |
|
Many thanks for all your contributions, @Goekdeniz-Guelmez ! Very helpful! Will be looking forward! |
|
however you can try the Qwen2, 2.5, 3, Gemma 3 and let me know how it is. |
…ypes and update supported models list
…els list adding Qwen3 Omni MoE
|
How are doing? Any updates here? I’m making some major changes in #681 and after that I will add vision attention chunking to reduce peak memory usage and OOM errors when processing images with 2K and above resolution |
This is a new branch since the old one was not comprehendible and lead to too many errors, the old PR will later get closed. Full weight finetuning works on the Qwen models also quantised too.