Skip to content

[Frontend][MLIR] Fix crash when MLIR_DUMP_PATH is set to a directory #6549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peymanbr
Copy link

@peymanbr peymanbr commented Apr 21, 2025

Root cause:

The mlir_dumps() function is defined in an anonymous namespace and uses
a static instance of llvm::raw_fd_ostream to store MLIR IR dumps in the
path set by MLIR_DUMP_PATH. If MLIR_DUMP_PATH is set to a directory,
this function attempts to open it as a file, causing a crash.

Steps to reproduce the crash:

Run the following from the root of the Triton repo to trigger a clean rebuild.

  1. Clean and rebuild Triton to ensure the runtime uses current sources:
python setup.py clean --all
rm -rf build
find . -name '*.so' -delete
pip install -e .
  1. Set environment variables to enable MLIR IR dumping:
export MLIR_ENABLE_DUMP=1
export MLIR_DUMP_PATH=/tmp/triton_dumps
mkdir -p /tmp/triton_dumps
  1. Run a Triton kernel, for instance the vector add tutorial:
python python/tutorials/01-vector-add.py

Fails with:

```console python: /path/to/triton_repo/python/src/ir.cc:52: llvm::raw_fd_ostream &(anonymous namespace)::mlir_dumps(): Assertion `!EC' failed. Aborted (core dumped) ```

Fix:

  • Refactor mlir_dumps() to detect if MLIR_DUMP_PATH is a directory. If so,
    generate a timestamped .mlir file in the directory instead.

  • Replace assert(!EC) with an explicit error check followed by abort().
    This ensures the stream creation failure is always caught at runtime,
    even in nop builds.

  • Remove the static qualifier from the llvm::raw_fd_ostream instance.
    Since mlir_dumps() is defined in an anonymous namespace, the stream
    object already has internal linkage and does not require static
    storage duration. This change sets the MLIR dump stream on the first
    mlir_dumps() call using the MLIR_DUMP_PATH environment variable.

Additional changes:

  • Add test_mlir_dump_path.py to verify the expected behavior.
  • Update README.md to document MLIR_DUMP_PATH usage, and clean rebuild
    steps.

Fixes #6548

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@peymanbr peymanbr requested a review from ptillet as a code owner April 21, 2025 19:24
@peymanbr peymanbr force-pushed the fix_mlir_dump_path branch from aeb6226 to a9d0315 Compare April 22, 2025 08:24
EC, llvm::sys::fs::CD_CreateAlways);
assert(!EC);
return S;
if (!mlir_dump_stream) {
Copy link
Contributor

@peterbell10 peterbell10 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change ensures the dump path is re-evaluated on each call,
enabling flexible, per-kernel IR dumping.

This obviously isn't true since you only check the env variable on the first call. The only meaningful change here is supporting MLIR_DUMP_PATH as a directory, but in that case why not use TRITON_DUMP_DIR which is probably exactly what you're looking for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback.

This obviously isn't true since you only check the env variable on the first call.

Correct. What I meant was that in mlir_dumps_or_dbgs(), the environment variable is checked on every call, but the dump stream is initialized only once. I will correct the commit message and also add a clarifying comment to this function in the updated commit.

The only meaningful here is supporting MLIR_DUMP_PATH as a directory,

Correct.

but in that case why not use TRITON_DUMP_DIR which is probably exactly what you're looking for.

Actually not. Based on the Tips for hacking section of the README and the contents of the dumped files,
my understanding is that MLIR_DUMP_PATH and TRITON_DUMP_DIR serve different purposes.
MLIR_DUMP_PATH is controlled by MLIR_ENABLE_DUMP and is used to store MLIR-native IR dumps.
TRITON_DUMP_DIR is controlled by TRITON_KERNEL_DUMP and is used to store Triton IR, LLVM IR, and backend binaries, such as .cubin and .ptx files.

The purpose of this PR is to prevent crashes when MLIR_DUMP_PATH is set to a directory,
and to make its behavior consistent with what the README describes.

I'm currently learning the Triton code base, so please let me know if I've missed anything. I'd really appreciate it.

@peymanbr peymanbr force-pushed the fix_mlir_dump_path branch from c059d68 to 0e8afd2 Compare April 25, 2025 10:15
@peymanbr peymanbr changed the title [Python][MLIR] Fix crash when MLIR_DUMP_PATH is set to a directory [Frontend][MLIR] Fix crash when MLIR_DUMP_PATH is set to a directory Apr 25, 2025
Root cause:
The mlir_dumps() function is defined in an anonymous namespace and uses
a static instance of llvm::raw_fd_ostream to store MLIR IR dumps in the
path set by MLIR_DUMP_PATH. If MLIR_DUMP_PATH is set to a directory,
this function attempts to open it as a file, causing a crash.

Steps to reproduce the crash:

Run the following from the root of the Triton repo to trigger a clean rebuild.

1. Clean and rebuild Triton to ensure the runtime uses current sources:
    python setup.py clean --all
    rm -rf build
    find . -name '*.so' -delete
    pip install -e .

2. Set environment variables to enable MLIR IR dumping:
    export MLIR_ENABLE_DUMP=1
    export MLIR_DUMP_PATH=/tmp/triton_dumps
    mkdir -p /tmp/triton_dumps

2. Run a Triton kernel, for instance the vector add tutorial:
    python python/tutorials/01-vector-add.py

3. Fails with:
    python: /path/to/triton_repo/python/src/ir.cc:52:
    llvm::raw_fd_ostream &(anonymous namespace)::mlir_dumps(): Assertion `!EC' failed.
    Aborted (core dumped)

Fix:

 - Refactor mlir_dumps() to detect if MLIR_DUMP_PATH is a directory. If so,
   generate a timestamped .mlir file in the directory instead.

 - Replace assert(!EC) with an explicit error check followed by abort().
   This ensures the stream creation failure is always caught at runtime,
   even in nop builds.

 - Remove the static qualifier from the llvm::raw_fd_ostream instance.
   Since mlir_dumps() is defined in an anonymous namespace, the stream
   object already has internal linkage and does not require static
   storage duration. This change sets the MLIR dump stream on the first
   mlir_dumps() call using the MLIR_DUMP_PATH environment variable.

Additional changes:
- Add test_mlir_dump_path.py to verify the expected behavior.
- Update README.md to document MLIR_DUMP_PATH usage, and clean rebuild
  steps.

Fixes triton-lang#6548
@peymanbr peymanbr force-pushed the fix_mlir_dump_path branch from 0e8afd2 to db44dd3 Compare April 25, 2025 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when MLIR_DUMP_PATH is set to a directory instead of a file
2 participants