-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better validation of gpu schedules #8068
Conversation
This means we actually print error messages when using exceptions and the makefile
GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in #7946
src/ScheduleFunctions.cpp
Outdated
@@ -2269,6 +2269,7 @@ bool validate_schedule(Function f, const Stmt &s, const Target &target, bool is_ | |||
|
|||
std::ostringstream err; | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to left here in commented-out form, it is essential you add a comment explaining why. (Otherwise, just delete it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, deleted.
…alide/Halide into abadams/validate_gpu_schedules
ready to land? |
* Update makefile to use test/common/terminate_handler.cpp This means we actually print error messages when using exceptions and the makefile * Better validate of GPU schedules GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in halide#7946 * Delete dead code * Actually clear the ostringstream
GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.
This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:
Fixes the bug found in #7946