Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tunner transpose fails on various specific sizes #532

Open
baryluk opened this issue Feb 10, 2024 · 1 comment
Open

tunner transpose fails on various specific sizes #532

baryluk opened this issue Feb 10, 2024 · 1 comment

Comments

@baryluk
Copy link
Contributor

baryluk commented Feb 10, 2024

CLBlast-1.6.2-linux-x86_64

LD_LIBRARY_PATH=./lib ./bin/clblast_tuner_transpose_fast --platform 1 -m 2 -n 16


|   ID | total |               param |      local      |      global     |       compiles |         time |   GB/s |            status |
x------x-------x---------------------x-----------------x-----------------x----------------x--------------x--------x-------------------x
|  ref |     - |                   - |       8       8 |       8      16 |             OK |      0.03 ms |      - |      reference OK |
x------x-------x---------------------x-----------------x-----------------x----------------x--------------x--------x-------------------x
|    1 |    52 |    4    1    0    0 |       4       4 |       4      16 |   OK     84 ms |      0.03 ms |      - | L2 error 4.30e-01 | <-- skipping
|    2 |    52 |    4    1    0    1 |       4       4 |       4      16 |   OK     81 ms |      0.03 ms |      - | L2 error 1.20e+00 | <-- skipping
|    3 |    52 |    4    1    1    0 |       4       4 |       4      16 |   OK     83 ms |      0.03 ms |      - | L2 error 4.30e-01 | <-- skipping
|    4 |    52 |    4    1    1    1 |       4       4 |       4      16 |   OK     95 ms |      0.03 ms |      - | L2 error 1.20e+00 | <-- skipping
|    5 |    52 |    4    2    0    0 |       4       4 |       4       8 |   OK     95 ms |      0.03 ms |      - | L2 error 6.83e-01 | <-- skipping
|    6 |    52 |    4    2    0    1 |       4       4 |       4       8 |   OK     95 ms |      0.03 ms |      - | L2 error 1.61e+00 | <-- skipping
|    7 |    52 |    4    2    1    0 |       4       4 |       4       8 |   OK     91 ms |      0.03 ms |      - | L2 error 6.83e-01 | <-- skipping
^C

Example of sizes that do fail:

m==2, (n==4 || n>=16)
m==3, (n==15 || n==16 || n>=25)
m==4, (n==9||n>=10)
m==5, (n==1 || n==8 || n==9 || n==12 || n>=20)
m==9, (n==8 || n==10 || n==15 || 20<=n<=48 || n>=64)
m==10, (20<=n<=32 || n>=50)
m==12, (20<=n<=50 || n>=92)
#!/bin/bash

for m in 1 2 3 4 5 8 9 10 12 15 16 20 25 30 32 48 50 64 92 100 128 156 200 256 300 384 400 500 512 1000 1024 1200 1600 2000 2048 3000 4000 4096 5000 8192; do
for n in 1 2 3 4 5 8 9 10 12 15 16 20 25 30 32 48 50 64 92 100 128 156 200 256 300 384 400 500 512 1000 1024 1200 1600 2000 2048 3000 4000 4096 5000 8192; do
  echo "m: $m n: $n" "$(LD_LIBRARY_PATH=./lib ./bin/clblast_tuner_transpose_fast --platform 1 -runs 1 -m $m -n $n | grep -E 'Best parameters:')"
done
done
@CNugteren
Copy link
Owner

That is expected behaviour. The tuner simply runs a specific kernel, and certain kernels have certain constraints, also dependent on the tuner parameters. That's why those cases are skipped.

Furthermore, it is probably not a good idea to tune for these tiny input size, because the main you'll measure is kernel launch time overhead and similar things. Probably best to start at 64x64 or even higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants