Deepspeed Ulysses sequence parallel is not working for Gemma4 #8002

mingxiang1006 · 2026-05-10T17:12:16Z

mingxiang1006
May 10, 2026

I have long context > 64k , and need to use deepspeed zero3 with Deepspeed Ulysses sequence parallelism. However due to model architecture, the head dimension of the local and global (256, 512) are different , my QKV tensor dimension could not match with the target size (always take the global head dim), meanwhile my QKV is with shape if local head dim. Appreciate if anyone have insights on this. I can share my debug logs if needed

vincere-mori · 2026-05-24T14:39:54Z

vincere-mori
May 24, 2026

The root issue is that Gemma4 uses different head dimensions for local vs global attention (256 and 512 respectively), and Ulysses assumes uniform head_dim across all heads. When it does the all-to-all scatter/gather and tries to reshape the gathered tensor, the shape doesn't match.

One workaround: intercept the attention forward call and handle local/global heads separately. Split QKV by head type before the all-to-all, run Ulysses scatter independently for each group (within each group head_dim is uniform), then concatenate before the actual attention computation. It's a bit of surgery but not a huge amount of code.

There's an open issue somewhere about non-uniform head_dim support in Ulysses. Worth checking if there's been any movement there, since hybrid attention with different head sizes is showing up in a lot of recent architectures.

If you can share the debug log with the exact reshape error, I can probably point to the specific line that needs patching.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deepspeed Ulysses sequence parallel is not working for Gemma4 #8002

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Deepspeed Ulysses sequence parallel is not working for Gemma4 #8002

Uh oh!

mingxiang1006 May 10, 2026

Replies: 1 comment

Uh oh!

vincere-mori May 24, 2026

mingxiang1006
May 10, 2026

vincere-mori
May 24, 2026