Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HEXAGON] Permit hvx_128 instruction in Blur Target #8197

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

FabianSchuetze
Copy link
Contributor

I noticed that the variable HL_TARGET is not used in the generator when building the blur app for a Hexagon device. Instead, the target is always the generic
arm-64-android. The output from calling the generator is:

➜  blur git:(blur_hexagon) ✗ HL_TARGET=arm-64-android-hvx_128 ./adb_run_on_device.sh
...
bin/host/halide_blur.generator -g halide_blur -e static_library,h,registration,stmt,assembly -o bin/arm-64-android target=arm-64-android
...

The app then runs with the following timings:

times: 0.037547 0.004319 0.001070

In contrast, if explicitly use the environment variable HL_TARGET in the Makefile, the output from the build is:

bin/host/halide_blur.generator -g halide_blur -e static_library,h,registration,stmt,assembly -o bin/arm-64-android target=arm-64-android-hvx_128

The timings for the app on an Snapdragon 865 are:

times: 0.002937 0.001223 0.000097

To permit building the app with the hvx instructions I had to change the following lines in the generator:

+++ b/apps/blur/halide_blur_generator.cpp
@@ -86,8 +86,6 @@ public:
             const int vector_size = 128;
 
             blur_y.compute_root()
-                .hexagon()
-                .prefetch(input, y, y, 2)
                 .split(y, y, yi, 128)
                 .parallel(y)
                 .vectorize(x, vector_size * 2);

Because of the lower latency and higher degree of control, I wonder whether these changes seem beneficial or if the results are idiosyncratic to my setup. I understand the hexagon() call should be used then work is shared between the CPU and the DSP.

@FabianSchuetze FabianSchuetze changed the title Permit hvx_128 instruction in Blur Target [HEXAGON] Permit hvx_128 instruction in Blur Target Apr 16, 2024
@abadams
Copy link
Member

abadams commented Apr 16, 2024

That change to the schedule actually disables use of hvx. I think the issue is in adb_run_on_device.sh, which seems to intentionally drop HL_TARGET in favor of a locally defined APP_TARGET. It should probably use HL_TARGET, because, as you've noticed, right now it's not behaving as expected.

@FabianSchuetze
Copy link
Contributor Author

Thanks for the review and the comments!

I cannot run the example from the main branch with the hexagon() (and prefetch())instructions.

The output from running HL_TARGET=arm-64-android-hvx_128 ./adb_run_on_device.sh is that the program aborts on the target:

➜  blur git:(blur_hexagon) ✗ HL_TARGET=arm-64-android-hvx_128 ./adb_run_on_device.sh
g++ -O3 -std=c++17 -I /home/fabian/Documents/work/github/Halide/distrib/include/ -I /home/fabian/Documents/work/github/Halide/distrib/tools/  -Wall -Werror -Wno-unused-function -Wcast-qual -Wignored-qualifiers -Wno-comment -Wsign-compare -Wno-unknown-warning-option -Wno-psabi -fno-rtti halide_blur_generator.cpp /home/fabian/Documents/work/github/Halide/distrib/tools/GenGen.cpp -o bin/host/halide_blur.generator /home/fabian/Documents/work/github/Halide/distrib/lib/libHalide.a -ldl -lpthread -lz -lzstd -ltinfo
bin/host/halide_blur.generator -g halide_blur -e static_library,h,registration,stmt,assembly -o bin/arm-64-android target=arm-64-android-hvx_128
/home/fabian/Downloads/android-ndk-r25c//toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android26-clang++ -O3 -std=c++17 -I /home/fabian/Documents/work/github/Halide/distrib/include/ -I /home/fabian/Documents/work/github/Halide/distrib/tools/  -Wall -Werror -Wno-unused-function -Wcast-qual -Wignored-qualifiers -Wno-comment -Wsign-compare -Wno-unknown-warning-option -Wno-psabi -fno-rtti -fopenmp -Wall -O2 -Ibin/arm-64-android test.cpp bin/arm-64-android/halide_blur.a -o bin/arm-64-android/test -llog -fPIE -pie -static-libstdc++
rm bin/arm-64-android/halide_blur.a
../../src/runtime/hexagon_remote/bin/arm-64-andr... 1 file pushed. 5.5 MB/s (29680 bytes in 0.005s)
../../src/runtime/hexagon_remote/bin/v65/signed_... 1 file pushed. 6.9 MB/s (47058 bytes in 0.007s)
cp: bad '/system/lib/rfsa/adsp/testsig*': No such file or directory
bin/arm-64-android/test: 1 file pushed. 57.1 MB/s (2142200 bytes in 0.036s)
Aborted 

Please note that I do have to change the Makefile as in this PR. If I instead replace APP_TARGET=$(HL_TARGET) in adb_run_on_device.sh, the test binary gets built by my local gcc compiler, instead of the one in ANDROID_NDK_ROOT dir, see the output below:

g++ -O3 -std=c++17 -I /home/fabian/Documents/work/github/Halide/distrib/include/ -I /home/fabian/Documents/work/github/Halide/distrib/tools/  -Wall -Werror -Wno-unused-function -Wcast-qual -Wignored-qualifiers -Wno-comment -Wsign-compare -Wno-unknown-warning-option -Wno-psabi -fno-rtti -fopenmp -Wall -O2 -Ibin/arm-64-android-hvx_128 test.cpp bin/arm-64-android-hvx_128/halide_blur.a -o bin/arm-64-android-hvx_128/test -ldl -lpthread -lz
/usr/bin/ld: bin/arm-64-android-hvx_128/halide_blur.a(halide_blur.a.o): Relocations in generic ELF (EM: 183)
/usr/bin/ld: bin/arm-64-android-hvx_128/halide_blur.a(halide_blur.a.o): Relocations in generic ELF (EM: 183)

@abadams
Copy link
Member

abadams commented Apr 17, 2024

Sounds like your device might not have the test signatures necessary to run user code on hexagon. Your change made it work by not running code on hexagon.

The Makefile change is definitely wrong. The way these makefiles work is that 'make bin/some_target/test' should build for the target 'some_target', without reference to HL_TARGET. Only top-level makefile rules like 'make test' use HL_TARGET. They're built this way because otherwise if you try to build for multiple targets (e.g. with and without profiling) without cleaning the build tree you can get into situations where make can't find dependencies, or picks up the wrong dependencies. Makefile rules shouldn't reference external state if that state may change without doing a 'make clean'.

Looks like this Makefile requires you to have set a value of CXX-my_halide_target and CXXFLAGS-my_halide_target and LDFLAGS-my_halide_target for each Halide target you want to use, and that apps/support/Makefile.inc is missing definitions of {CXX,LDFLAGS,CXXFLAGS}-arm-64-android-hvx_128

@FabianSchuetze
Copy link
Contributor Author

Thanks for your detailed answer. I have opened up a dedicated ticket to debug the hexagon issues #8201

@FabianSchuetze
Copy link
Contributor Author

Thanks for the review. I have updated the PR. Would that be a reasonable approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants