Confirmed: Qwen Image LoRA training works on Apple Silicon (M3 Ultra, MPS, fp32) #13170
MsFixer101
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Successfully trained a Qwen Image DreamBooth LoRA on Apple Silicon using MPS backend — appears to be the first public confirmation of this working.
Hardware: Mac Studio M3 Ultra, 256GB unified memory
Software: PyTorch 2.10.0, diffusers 0.37.0.dev0 (from git), peft, accelerate
Script:
examples/dreambooth/train_dreambooth_lora_qwen_image.pySpeed: ~4.5 seconds/step at 512px resolution
Result: Training completes with decreasing loss, LoRA weights saved successfully
Key Finding: fp32 Required
fp16 produces NaN loss on MPS. The script correctly disables
native_ampfor MPS (line ~935), but without gradient scaling, fp16 gradients underflow to NaN. Switching to--mixed_precision="no"(fp32) resolves this completely.Working Command
What Works
--offload(text encoder + VAE to CPU when not in use)--cache_latents(pre-compute VAE outputs, remove VAE from training loop)What Doesn't Work
--mixed_precision="fp16"— NaN loss (no AMP gradient scaler on MPS)--mixed_precision="bf16"— correctly blocked by script (MPS doesn't support bf16)--use_8bit_adam— bitsandbytes is CUDA-onlySpeed Projections (fp32, 512px)
Notes
PYTORCH_ENABLE_MPS_FALLBACKwarnings during training — all ops ran natively on Metal--offloadEnvironment
check_min_version)Hope this helps others trying to train Qwen Image LoRAs on Mac!
Beta Was this translation helpful? Give feedback.
All reactions