Skip to content

Runtime error handling is easy to get wrong, hard to get right #7350

Open
@steven-johnson

Description

@steven-johnson

(From a comment on #6924:)

When an error occurs in any Halide runtime code, you must do BOTH of these things:

  • call halide_error(), either directly, or indirectly (typically via the error() stream output)
  • AND, return a nonzero error code from the original entry point of the runtime call.

It's easy to leave one of these out, but failing to return a nonzero error code means that anything that replaces the error handler with something that doesn't abort won't see the errors (and may crash).

It's way too easy to make this mistake, and way too hard to vet code to verify that all the sites are correct. We need to think about a way to restructure error handling in our runtime code to make this bulletproof. Strawman:

  • struct HalideRuntimeError { int result; };
  • Compile-time assertions to verify it is a plain old data structure that is the same size as int (ie, bit-compatible)
  • Only ctor is of the form HalideRuntimeError(halide_error_code_t value, char *msg = "");
  • Calling this ctor implicitly calls halide_error() with the msg (or some useful default if empty)
  • Probably have some macro-ish wrapper to allow constructing text via << as we do now?
  • Copy ctor is deleted, but move ctor exists
  • All internal code in the Runtime that currently calls error() or halide_error() is revised to return one of these to its caller. (All 'toplevel' halide runtime calls will just return the .result field.)
  • We start adapting code at the leaf level and work our way back up; might need some temporary scaffolding to allow everything to get updated properly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions