forked from triton-lang/triton
-
Notifications
You must be signed in to change notification settings - Fork 9
Fix underflow in Triton's highestPowOf2Divisor function when the input is INT_MIN #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Not sure this is worthy to make it? I was annoyed by long sha256-based cache directory names, mostly 64 chars. So I quickly added base64-based shorter cache directory names. Instead of fixing a dozen places that use `hashlib.sha256`, I patched the cache manager. 64-char names are mostly reduced to 43-44 chars. A comparison: ``` > % ls -l $TRITON_CACHE_DIR total 0 drwxr-xr-x 1 minjang users 40 Aug 21 19:02 44ae4aee7ef0ee0dd54e860cf44627e3b6cedabe87a228ac75988301b8a6bf60 drwxr-xr-x 1 minjang users 26 Aug 21 19:02 82dc2c9a5508bf07c72e02353c1e751dc54aae85666f139b2867b0a1e95e0e7b drwxr-xr-x 1 minjang users 226 Aug 21 19:02 b8e240968a85711ba57b17bf8450f1ffbc85a8de8cd1f47aa87b241b53f9bf60 drwxr-xr-x 1 minjang users 26 Aug 21 19:03 gtwsmlUIvwfHLgI1PB51HcVKroVmbxObKGewoeleDns drwxr-xr-x 1 minjang users 40 Aug 21 19:03 RK5K7n7w7g3VToYM9EYn47bO2r6HoiisdZiDAbimv2A drwxr-xr-x 1 minjang users 226 Aug 21 19:03 uOJAloqFcRulexe_hFDx_7yFqN6M0fR6qHskG1P5v2A ``` `test_core.py` runs without any errors, and the cache directory has all base64-based shorter names.
The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [x ] I am not making a trivial change, such as fixing a typo in a comment. - [] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [ ] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x ] This PR does not need a test because `PR merely modifies the implementation of the function, removing the superfluous parts.`. - Select one of the following. - [ ] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)
…ng#4539) This is motivated by triton-lang#4509. The crux of the problem is that the Triton code generator needs to inspect a function's arguments / attributes / types in order to determine how it should be called. This meant that "implementation details" like whether a function is a builtin needed to be exposed in the "interface" `tl.extra.libdevice` module, instead of just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that libdevice functions marked as `@core.extern` in the interface could not be implemented via JitFunctions. Allowing each backend to provide its own module map solves this problem as the code generator can inspect the actual function implementation.
Firmware updated after upgrading kernel-mode driver to the newer version 6.2.0. Tested test_randn and test_ptx_cast for an hour as a usual reproducer and no longer shows any failure. --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>
Missed this one test the last time.
chsigg
reviewed
Aug 23, 2024
Member
chsigg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. Fixed the errors in the formulas in the comments. 2. When assert is enabled in DEBUG mode, using the list type will report an error. Additionally, the Tensor returned by the internal interface can ensure that the shape is a list and does not require checking. The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because it does not take any effect to working code. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) Co-authored-by: 谢双镱 <xieshuangyi@bytedance.com>
… code (triton-lang#4561) The knob is only for debugging purposes and should be removed once we've figured out the issue
…ay escape out of the loop (triton-lang#4567) Merging 7c90081: '[SCF][PIPELINE] Handle the case when values from the peeled prologue may escape out of the loop' from the upstream LLVM into Triton Pipeline Expander "Previously the values in the peeled prologue that weren't treated with the predicateFn were passed to the loop body without any other predication. If those values are later used outside of the loop body, they may be incorrect if the num iterations is smaller than num stages - 1. We need similar masking for those, as is done in the main loop body, using already existing predicates."
…ng#4255) PyTorch have ways to load the libamdhip64.so library, and the original logic has problems detecting this shared object file because it can be placed at any location with `venv`/`DT_RUNPATH`/`LD_LIBRARY_PATH`/ `/etc/ld.so.conf.d/*.conf`/`LD_PRELOAD`. Notably the `RUNPATH` has become the preferred approach for multi-version ROCm installations. This patch let Triton enumerate the address space with `dl_iterate_phdr`, and choose the libamdhip64.so that is already loaded into address space unless `TRITON_LIBHIP_PATH` is already set. --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>
Cast dot arguments from unsupported FP8 to supported FP16 in order to use MFMA instructions instead of FMA. This approach is expected to give better performance and be more stable compared to FMA implementation. --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>
…on-lang#4571) Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>
Proposing this change because we use these tests for CPU backend here: https://github.com/microsoft/triton-shared/ and these clauses lead to wrong assumptions about supported features. The core Triton is a small number of people, and we receive many PRs (thank you!). To help us review your code more quickly, **if you are a new contributor (less than 3 PRs merged) we ask that you complete the following tasks and include the filled-out checklist in your PR description.** Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `this is a fix to tests`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.) Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com>
…ing input and output dimensions (triton-lang#4530) Before this patch, if `a * b = c`, `c.divideRight(b)` might return `nullopt` even if `a' * b = c`, where `a'` is the potential result of `divideRight`. This PR addresses the issue by conservatively removing input and output dimensions, ensuring that the division returns a non-nullopt result when a valid solution exists. However, it does not guarantee that `a` and `a'` will have identical input and output dimensions. In addition, this PR also fixes a bug in `TritonGPURemoveLayoutConversionsPass`. The backward slice should be continued when encountering a free conversion. This includes cases where `c.divideRight(b)` results in a layout that only permutes register values within individual threads. --------- Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>
… 1 (triton-lang#4579) If the subview for the mma operand is in the previous iteration of the loop, it would be skipped by the code that looks for it, if it feeds directly into the loop yield. This PR fixes that issue allowing for them subview to be found and enabling the pipelining in some cases skipped previously.
…ng#4582) When `other` is there we should use it to initalize the reg before doing the load instead of initializing the reg with 0. Note that this does add a scoreboard dependency between the `other` def and the load but user can remove it by using a select if other comes from a high latency op.
The command in readme isn't correct, I finally come out to get the compile command with `find python/build -name 'compile_commands.json' | xargs readlink -f`. I think although the improvement in readme is trivial but it is helpful for other new comer go through the environment setup easier. --------- Signed-off-by: jayzhan <jayzhan211@gmail.com>
) This PR adds verbosity to assembly code after LLVM backend passes. This adds references to the source code for both NV and AMD. Additionally, it adds `Kernel Info` at the end of the dump for AMD. For example: ``` ; Kernel info: ; codeLenInByte = 7732 ; NumSgprs: 24 ; NumVgprs: 154 ; NumAgprs: 128 ; TotalNumVgprs: 284 ... ```
According to the [table](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history), both CUDA 12.5 and 12.6 use PTX ISA 8.5
- Supported linear layout for the 2nd version of WMMA - Supported legacy emit indices related helper functions for now - Removed remaining assertions related to WMMAv2 Signed-off-by: Ilya Veselov <iveselov.nn@gmail.com>
…lang#4584) Get ready to separate scheduling out of SWP, so we can move scheduleLoads and schedulePrologueAndEpilogue to a separate scheduling pass. Lowering will happen after inside SWP.
Dump warning if SWP fails in the inner loop and dump option is enabled in the CL. --------- Co-authored-by: Zeng Wu <zengwu@fb.com>
…s8->bf16 conversions (triton-lang#4563) Hopper has very low throughput of conversion instructions that cause this operations to quickly become an ALU bottleneck. Restating it in terms of bitwise ops and SIMD bf16 instructions increases the throughput significantly and translates to meaningful speedups (e.g. 10% end-to-end on one matmul I was looking at). Co-authored-by: Adam Paszke <adam.paszke@gmail.com>
061c945 to
5d6033c
Compare
gflegar
approved these changes
Sep 2, 2024
gflegar
added a commit
that referenced
this pull request
Aug 12, 2025
…lang#7796) Getting a crash internally when running `09-persistent-matmul.py` tutorial, and ASAN reports the following: ``` ==7854==ERROR: AddressSanitizer: heap-use-after-free on address 0x7c884c02e800 at pc 0x557f344112d9 bp 0x7b35908a1840 sp 0x7b35908a1838 READ of size 8 at 0x7c884c02e800 thread T1128 #0 0x557f344112d8 in getNextOperandUsingThisValue third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:43:58 #1 0x557f344112d8 in operator++ third_party/llvm/llvm-project/mlir/include/mlir/IR/UseDefLists.h:322:39 #2 0x557f344112d8 in mlir::ResultRange::UseIterator::operator++() third_party/llvm/llvm-project/mlir/lib/IR/OperationSupport.cpp:613:5 #3 0x557f2ab70625 in mlir::lowerTokenOperations(mlir::Operation*, int, int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:269:27 #4 0x557f2ab70de8 in mlir::doTokenLowering(mlir::triton::FuncOp&, unsigned int) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization/WSLowerToken.cpp:321:3 #5 0x557f2ab2d018 in mlir::NVGPUWarpSpecializationPass::runOnFuncOp(mlir::triton::FuncOp) third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:99:5 #6 0x557f2ab2c5d6 in operator() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:55 #7 0x557f2ab2c5d6 in operator() third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:304:7 #8 0x557f2ab2c5d6 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<std::__u::enable_if<!llvm::is_one_of<mlir::triton::FuncOp, mlir::Operation*, mlir::Region*, mlir::Block*>::value && std::is_same<void, void>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, mlir::ForwardIterator, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp), mlir::triton::FuncOp, void>(mlir::Operation*, mlir::NVGPUWarpSpecializationPass::runOnOperation()::'lambda'(mlir::triton::FuncOp)&&)::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12 #9 0x557f2820ce45 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69:12 #10 0x557f2820ce45 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:152:5 #11 0x557f2820ce2c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:147:9 #12 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), mlir::triton::FuncOp, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:306:10 #13 0x557f2ab2c0c9 in walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (lambda at third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:26), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:798:12 #14 0x557f2ab2c0c9 in mlir::NVGPUWarpSpecializationPass::runOnOperation() third_party/triton/third_party/nvidia/hopper/lib/Transforms/WarpSpecialization.cpp:108:21 ... ``` The problem seems to be that we are iterating through uses, and then removing some of them inside the loop, which invalidates the iterator.
chsigg
pushed a commit
that referenced
this pull request
Aug 26, 2025
…leaveTMem.cpp (triton-lang#7924) `TritonNvidiaGPU/interleave_tmem.mlir` fails under address sanitizer. The `ConstantIntOp` operations were created without attachment to any block in http://github.com/triton-lang/triton/pull/7622, which caused a memory leak. This change addresses the problem by adding an insertion point. <details open> <summary>Full log</summary> ================================================================= ==3831==ERROR: LeakSanitizer: detected memory leaks Direct leak of 576 byte(s) in 6 object(s) allocated from: #0 0x55c3eca39164 in malloc [third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp?l=67&ws=tap-presubmit-server/421956858&snapshot=2):3 #1 0x55c3f176afb3 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::DictionaryAttr, mlir::OpaqueProperties, mlir::BlockRange, unsigned int) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:113](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=113&ws=tap-presubmit-server/421956858&snapshot=2):46 #2 0x55c3f176a90c in create [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:74](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=74&ws=tap-presubmit-server/421956858&snapshot=2):10 #3 0x55c3f176a90c in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::OpaqueProperties, mlir::BlockRange, mlir::RegionRange) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:57](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=57&ws=tap-presubmit-server/421956858&snapshot=2):7 #4 0x55c3f176a61b in mlir::Operation::create(mlir::OperationState const&) [third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp:35](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Operation.cpp?l=35&ws=tap-presubmit-server/421956858&snapshot=2):7 #5 0x55c3f1678a78 in mlir::OpBuilder::create(mlir::OperationState const&) [third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp:453](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/IR/Builders.cpp?l=453&ws=tap-presubmit-server/421956858&snapshot=2):17 #6 0x55c3ecf3668f in mlir::arith::ConstantIntOp mlir::OpBuilder::create<mlir::arith::ConstantIntOp, int, int>(mlir::Location, int&&, int&&) [third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h:507](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/Builders.h?l=507&ws=tap-presubmit-server/421956858&snapshot=2):16 #7 0x55c3eefa690a in findBufferAccessMemdescSubview [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:75](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=75&ws=tap-presubmit-server/421956858&snapshot=2):33 #8 0x55c3eefa690a in mlir::triton::nvidia_gpu::(anonymous namespace)::findBufferAccess(mlir::Value) [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:151](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=151&ws=tap-presubmit-server/421956858&snapshot=2):12 #9 0x55c3eefa70e7 in mlir::triton::nvidia_gpu::(anonymous namespace)::findBufferAccess(mlir::Value) [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:156](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=156&ws=tap-presubmit-server/421956858&snapshot=2):34 #10 0x55c3eefa4c0c in tmemMayAlias [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:173](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=173&ws=tap-presubmit-server/421956858&snapshot=2):28 #11 0x55c3eefa4c0c in sinkOps [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:227](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=227&ws=tap-presubmit-server/421956858&snapshot=2):36 #12 0x55c3eefa4c0c in trySinkOp [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:253](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=253&ws=tap-presubmit-server/421956858&snapshot=2):10 #13 0x55c3eefa4c0c in mlir::triton::nvidia_gpu::TritonNvidiaGPUInterleaveTMemPass::runOnOperation() [third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp:275](https://cs.corp.google.com/piper///depot/google3/third_party/triton/lib/Dialect/TritonNvidiaGPU/Transforms/InterleaveTMem.cpp?l=275&ws=tap-presubmit-server/421956858&snapshot=2):14 #14 0x55c3f1560ad1 in operator() [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):17 #15 0x55c3f1560ad1 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12 #16 0x55c3f1559920 in operator() [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12 #17 0x55c3f1559920 in executeAction<mlir::PassExecutionAction, mlir::Pass &> [third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h:280](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/include/mlir/IR/MLIRContext.h?l=280&ws=tap-presubmit-server/421956858&snapshot=2):7 #18 0x55c3f1559920 in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:547](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=547&ws=tap-presubmit-server/421956858&snapshot=2):21 #19 0x55c3f155d46f in runPipeline [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:619](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=619&ws=tap-presubmit-server/421956858&snapshot=2):16 #20 0x55c3f155d46f in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:933](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=933&ws=tap-presubmit-server/421956858&snapshot=2):10 #21 0x55c3f155d15b in mlir::PassManager::run(mlir::Operation*) [third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:913](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp?l=913&ws=tap-presubmit-server/421956858&snapshot=2):60 #22 0x55c3ed0a8b20 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:477](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=477&ws=tap-presubmit-server/421956858&snapshot=2):17 #23 0x55c3ed0a8363 in processBuffer [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:553](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=553&ws=tap-presubmit-server/421956858&snapshot=2):12 #24 0x55c3ed0a8363 in operator() [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:642](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=642&ws=tap-presubmit-server/421956858&snapshot=2):12 #25 0x55c3ed0a8363 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&) [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=46&ws=tap-presubmit-server/421956858&snapshot=2):12 triton-lang#26 0x55c3f17bd34f in operator() [third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:69](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h?l=69&ws=tap-presubmit-server/421956858&snapshot=2):12 triton-lang#27 0x55c3f17bd34f in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::MemoryBufferRef const&, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) [third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp:30](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Support/ToolUtilities.cpp?l=30&ws=tap-presubmit-server/421956858&snapshot=2):12 triton-lang#28 0x55c3ed09d0c6 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:647](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=647&ws=tap-presubmit-server/421956858&snapshot=2):26 triton-lang#29 0x55c3ed09d67f in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:693](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=693&ws=tap-presubmit-server/421956858&snapshot=2):14 triton-lang#30 0x55c3ed09dc59 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) [third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:709](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp?l=709&ws=tap-presubmit-server/421956858&snapshot=2):10 triton-lang#31 0x55c3eca74a70 in main [third_party/triton/bin/triton-opt.cpp:14](https://cs.corp.google.com/piper///depot/google3/third_party/triton/bin/triton-opt.cpp?l=14&ws=tap-presubmit-server/421956858&snapshot=2):33 triton-lang#32 0x7f1fd58613d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94) triton-lang#33 0x55c3ec995aa9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 </details> --------- Co-authored-by: Thomas Raoux <thomas.raoux@openai.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
highestPowOf2Divisor function here https://github.com/triton-lang/triton/blob/8e63999da7ef87fc6ac908f26a9b9c05ce85ab70/include/triton/Dialect/Triton/IR/Utility.h#L33 will run during analysis when coalesce pass is invoked. This function will underflow if the given input is INT_MIN and will fail when run under ASAN.
This change handles this edge case and adds a minimal lit test to make sure the case is covered.