Test case webgpu:shader,execution,expression,call,builtin,subgroupBroadcast:compute,all_active:wgSize=[3,3,3] produces a thread group with 333=27 threads. Test subcase 3 attempts to subgroupBroadcast from subgroup lane 3. When subgroup size is 4 or 8, the last subgroup to execute will not have a thread executing at subgroup index 3.
Subgroup size 4 will have:
- 0, 1, 2, 3
- 4, 5, 6, 7
- 8, 9, 10, 11
- 12, 13, 14, 15
- 16, 17, 18, 19
- 20, 21, 22, 23
- 24, 25, 26
Subgroup size 8 will have:
- 0 - 7
- 8 - 15
- 16 - 23
- 24, 25, 26
It's not until subgroup size 16 where all active subgroups include an active lane at index 3: