-
Notifications
You must be signed in to change notification settings - Fork 2k
[Frontend][MLIR] Fix crash when MLIR_DUMP_PATH is set to a directory #6549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
aeb6226
to
a9d0315
Compare
EC, llvm::sys::fs::CD_CreateAlways); | ||
assert(!EC); | ||
return S; | ||
if (!mlir_dump_stream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change ensures the dump path is re-evaluated on each call,
enabling flexible, per-kernel IR dumping.
This obviously isn't true since you only check the env variable on the first call. The only meaningful change here is supporting MLIR_DUMP_PATH
as a directory, but in that case why not use TRITON_DUMP_DIR
which is probably exactly what you're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback.
This obviously isn't true since you only check the env variable on the first call.
Correct. What I meant was that in mlir_dumps_or_dbgs()
, the environment variable is checked on every call, but the dump stream is initialized only once. I will correct the commit message and also add a clarifying comment to this function in the updated commit.
The only meaningful here is supporting
MLIR_DUMP_PATH
as a directory,
Correct.
but in that case why not use
TRITON_DUMP_DIR
which is probably exactly what you're looking for.
Actually not. Based on the Tips for hacking section of the README
and the contents of the dumped files,
my understanding is that MLIR_DUMP_PATH
and TRITON_DUMP_DIR
serve different purposes.
MLIR_DUMP_PATH
is controlled by MLIR_ENABLE_DUMP
and is used to store MLIR-native IR dumps.
TRITON_DUMP_DIR
is controlled by TRITON_KERNEL_DUMP
and is used to store Triton IR, LLVM IR, and backend binaries, such as .cubin and .ptx files.
The purpose of this PR is to prevent crashes when MLIR_DUMP_PATH
is set to a directory,
and to make its behavior consistent with what the README
describes.
I'm currently learning the Triton code base, so please let me know if I've missed anything. I'd really appreciate it.
c059d68
to
0e8afd2
Compare
Root cause: The mlir_dumps() function is defined in an anonymous namespace and uses a static instance of llvm::raw_fd_ostream to store MLIR IR dumps in the path set by MLIR_DUMP_PATH. If MLIR_DUMP_PATH is set to a directory, this function attempts to open it as a file, causing a crash. Steps to reproduce the crash: Run the following from the root of the Triton repo to trigger a clean rebuild. 1. Clean and rebuild Triton to ensure the runtime uses current sources: python setup.py clean --all rm -rf build find . -name '*.so' -delete pip install -e . 2. Set environment variables to enable MLIR IR dumping: export MLIR_ENABLE_DUMP=1 export MLIR_DUMP_PATH=/tmp/triton_dumps mkdir -p /tmp/triton_dumps 2. Run a Triton kernel, for instance the vector add tutorial: python python/tutorials/01-vector-add.py 3. Fails with: python: /path/to/triton_repo/python/src/ir.cc:52: llvm::raw_fd_ostream &(anonymous namespace)::mlir_dumps(): Assertion `!EC' failed. Aborted (core dumped) Fix: - Refactor mlir_dumps() to detect if MLIR_DUMP_PATH is a directory. If so, generate a timestamped .mlir file in the directory instead. - Replace assert(!EC) with an explicit error check followed by abort(). This ensures the stream creation failure is always caught at runtime, even in nop builds. - Remove the static qualifier from the llvm::raw_fd_ostream instance. Since mlir_dumps() is defined in an anonymous namespace, the stream object already has internal linkage and does not require static storage duration. This change sets the MLIR dump stream on the first mlir_dumps() call using the MLIR_DUMP_PATH environment variable. Additional changes: - Add test_mlir_dump_path.py to verify the expected behavior. - Update README.md to document MLIR_DUMP_PATH usage, and clean rebuild steps. Fixes triton-lang#6548
0e8afd2
to
db44dd3
Compare
Root cause:
The mlir_dumps() function is defined in an anonymous namespace and uses
a static instance of llvm::raw_fd_ostream to store MLIR IR dumps in the
path set by MLIR_DUMP_PATH. If MLIR_DUMP_PATH is set to a directory,
this function attempts to open it as a file, causing a crash.
Steps to reproduce the crash:
Run the following from the root of the Triton repo to trigger a clean rebuild.
Fails with:
Fix:
Refactor mlir_dumps() to detect if MLIR_DUMP_PATH is a directory. If so,
generate a timestamped .mlir file in the directory instead.
Replace assert(!EC) with an explicit error check followed by abort().
This ensures the stream creation failure is always caught at runtime,
even in nop builds.
Remove the static qualifier from the llvm::raw_fd_ostream instance.
Since mlir_dumps() is defined in an anonymous namespace, the stream
object already has internal linkage and does not require static
storage duration. This change sets the MLIR dump stream on the first
mlir_dumps() call using the MLIR_DUMP_PATH environment variable.
Additional changes:
steps.
Fixes #6548
New contributor declaration
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD
.Select one of the following.
/test
forlit
tests/unittest
for C++ tests/python/test
for end-to-end testsFILL THIS IN
.Select one of the following.
lit
tests.lit
tests I have added follow these best practices,including the "tests should be minimal" section. (Usually running Python code
and using the instructions it generates is not minimal.)