Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better validation of gpu schedules #8068

Merged
merged 6 commits into from
Feb 7, 2024
Merged

Conversation

abadams
Copy link
Member

@abadams abadams commented Feb 5, 2024

GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:

Fixes the bug found in #7946

This means we actually print error messages when using exceptions and
the makefile
GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

```
Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:
```

Fixes the bug found in #7946
@@ -2269,6 +2269,7 @@ bool validate_schedule(Function f, const Stmt &s, const Target &target, bool is_

std::ostringstream err;

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to left here in commented-out form, it is essential you add a comment explaining why. (Otherwise, just delete it.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, deleted.

@steven-johnson
Copy link
Contributor

ready to land?

@abadams abadams merged commit 39e5c08 into main Feb 7, 2024
19 checks passed
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Update makefile to use test/common/terminate_handler.cpp

This means we actually print error messages when using exceptions and
the makefile

* Better validate of GPU schedules

GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

```
Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:
```

Fixes the bug found in halide#7946

* Delete dead code

* Actually clear the ostringstream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants