Commit db9c48d
authored
bugfix: fix the misaligned address bug of norm kernels for certain shapes (#636)
This PR fixes the issue #634, which is brought by #592 .
If we want to use 16-bytes vectorized read/write, we need to confirm the
address is aligned to 16 bytes.
When `num_warps` is not a multiple of 4 (4*sizeof(float) = 16), the
address of `smem + num_warps` might not align to 16 bytes.
We can fix this by shifting the start offset of vectorized read/write to
`smem + ceil_div(num_warps, 4) * 4` to force the alignment.
cc @ovowei @Abatom1 parent ae501ed commit db9c48d
2 files changed
+9
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| 130 | + | |
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| |||
151 | 152 | | |
152 | 153 | | |
153 | 154 | | |
154 | | - | |
| 155 | + | |
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
| |||
185 | 186 | | |
186 | 187 | | |
187 | 188 | | |
188 | | - | |
| 189 | + | |
189 | 190 | | |
190 | 191 | | |
191 | 192 | | |
| |||
247 | 248 | | |
248 | 249 | | |
249 | 250 | | |
250 | | - | |
| 251 | + | |
| 252 | + | |
251 | 253 | | |
252 | 254 | | |
253 | 255 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
| 108 | + | |
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | | - | |
| 126 | + | |
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| |||
0 commit comments