You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Training worked locally with 4 chips fsdp=4: export PJRT_DEVICE=TPU; export TORCHPRIME_TPU_TYPE=v6e-4 && python torchprime/torch_xla_models/train.py model=flex-qwen-1b
MFU: 0.21
On a v5p-128 cluster with command tp run --name jialei-0812-qwen-fsdp32tensor2 torchprime/torch_xla_models/train.py model=flex-qwen-1b task.global_batch_size=64 ici_mesh.fsdp=x ici_mesh.tensor=y