-
Notifications
You must be signed in to change notification settings - Fork 2k
Insights: triton-lang/triton
Overview
Could not load contribution data
Please try again later
64 Pull requests merged by 26 people
-
[Pipeliner] Skip async_wait when there is no async_cp op
#6681 merged
May 3, 2025 -
[NFC] Remove
#include "triton/Analysis/Utility.h"
fromLoopUnroll.cpp
as unused#6684 merged
May 3, 2025 -
[BACKEND] backwardRematerialization cost model
#6667 merged
May 2, 2025 -
[Blackwell] Fix the TMEM message heuristic
#6692 merged
May 2, 2025 -
Support hoist ConvertDot Operand when the leaves of the slice are DescriptorLoadOp
#6690 merged
May 2, 2025 -
[BUILD] Abort installation for Python <= 3.8
#6649 merged
May 2, 2025 -
[Blackwell] Fix TMEM message size selection heuristic
#6687 merged
May 2, 2025 -
[Backend] Use
NVVM::MapaOp
over call intrinsic (NFC)#6682 merged
May 2, 2025 -
[Blackwell] Fix TMEM allocation when captured buffer is not a subview
#6686 merged
May 2, 2025 -
[BACKEND] Fixes removeZeroBasesAlongDim outDims and surjectivity check
#6669 merged
May 2, 2025 -
Reorder tma lowering pass after fence insertion.
#6679 merged
May 2, 2025 -
Disable test_nvidia_tool for AMD and simplify default ptxas check
#6676 merged
May 2, 2025 -
[Testing] Update test_compile_stats to allow custom library complier paths
#6672 merged
May 2, 2025 -
[NFC] Factor core matrix layout out of nvmma conversion for clarity
#6677 merged
May 2, 2025 -
Avoid calling self.knobs during reset
#6673 merged
May 2, 2025 -
[PROTON-DEV] Trace reader for high-level parsing
#6662 merged
May 2, 2025 -
[lit][cfg] Declare mlir-translate in lit config
#6671 merged
May 2, 2025 -
[NVIDIA] WGMMA prefetch pass: Support asyn_token in LocalLoad
#6640 merged
May 2, 2025 -
[BACKEND] Fix wrong linear layout for 3D transposed nvmma
#6668 merged
May 1, 2025 -
[BACKEND] Fix tma lowering crash due to wrong insert point
#6666 merged
May 1, 2025 -
[NFC][Layouts] Cleanup leading offset references + exponential iteration
#6665 merged
May 1, 2025 -
[Blackwell] Refactor/slightly generalize warp specialization
#6597 merged
May 1, 2025 -
[BACKEND] Support memdesc_reshape to allow different HBM layout for mmav5
#6482 merged
May 1, 2025 -
[LAYOUTS] Improve split inference and fix comments
#6663 merged
May 1, 2025 -
[python][compiler] Implement CompilationListener to report compile times
#5957 merged
May 1, 2025 -
[Analysis] divisibility handling for dividing by a power-of-two constant
#6657 merged
May 1, 2025 -
[Hopper][WS] Automatic task partition for anchor ops
#6658 merged
Apr 30, 2025 -
[TEST] Enable test-alignment.mlir checks & remove failing checks
#6661 merged
Apr 30, 2025 -
[TEST][AMD] Only check kpack=1 on the cdna4
#6651 merged
Apr 30, 2025 -
[AMD][BACKEND] Filter barriers in
Membar
betweenLocalLoads
andAsyncCopy
when pipelining#6639 merged
Apr 30, 2025 -
Update README.md for vscode intellisense
#6648 merged
Apr 30, 2025 -
Revert "[BE][PIPELINE] Enabling mmav5 pipelining for 2 dots in the loop by default (#6599)"
#6653 merged
Apr 30, 2025 -
Revert "[LAYOUTS] Use divideLeft on layout inference (#6577)"
#6652 merged
Apr 30, 2025 -
[RELAND][TritonNVIDIAGPU] Add missing memory effects to some ops (#6518)
#6644 merged
Apr 29, 2025 -
[PROTON-DEV] Fill in some AMD op lowerings
#6604 merged
Apr 29, 2025 -
[Dialect] Mark Join and Split as pure
#6645 merged
Apr 29, 2025 -
Revert "[LAYOUTS] Fix mixed precision swizzling (#6565)"
#6643 merged
Apr 29, 2025 -
RFC [python] Rename config.py > knobs.py
#6641 merged
Apr 29, 2025 -
[NVIDIA] Enable Programmatic Dependent Launch in Triton
#6394 merged
Apr 29, 2025 -
[Backend][Hopper] Add a skeleton third-party warp specialization pass
#6624 merged
Apr 29, 2025 -
Use symlinks for external plugins to fix TRITON_PLUGIN_DIRS
#6627 merged
Apr 29, 2025 -
[NFC] Split Membar tests into common and ttng specific files
#6637 merged
Apr 29, 2025 -
[python][typehint] Simplify find_paths_if and add typehints to _utils
#6497 merged
Apr 29, 2025 -
[python][typing] Specify mypy_path in pyproject.toml
#6596 merged
Apr 29, 2025 -
[PROTON-DEV] Trace binary decoder and event parser
#6603 merged
Apr 29, 2025 -
[BENCH] Ignore egg-info files
#6631 merged
Apr 29, 2025 -
[Remarks][MMA] Add remark for not applying MMAv5 and improve wording on remark of MMAv3
#6528 merged
Apr 29, 2025 -
[BACKEND] Fix dot-operand pattern
#6632 merged
Apr 29, 2025 -
[TritonNVIDIAGPU] Add dependency tokens to TMEM ops
#6520 merged
Apr 29, 2025 -
[AMD][BACKEND] Schedule
AsyncWait
in front ofAsyncCopy
andLocalLoad
#6621 merged
Apr 29, 2025 -
[AMD][BACKEND] Add alias information to
other
stores fromAsyncCopy/BufferLoadToLocal
#6619 merged
Apr 29, 2025 -
[BENCH] Remove use of the unpack operator to make scripts compatible with Python earlier than 3.11.
#6630 merged
Apr 29, 2025 -
[NVIDIA] WGMMA subtiling: distribute NumImpreciseAcc across subtiles when NumImpreciseAcc > 0
#6469 merged
Apr 28, 2025 -
[Dialect] Mark Join, Split, and Cat as non speculatable
#6629 merged
Apr 28, 2025 -
Fix TensorMemoryAllocation to correctly implement getAlloc
#6614 merged
Apr 28, 2025 -
[python][config] Add cache.home_dir config
#6626 merged
Apr 28, 2025 -
[AMD] Disable warp specialization tests for AMD target
#6620 merged
Apr 28, 2025 -
[BACKEND] Fix fence insertion analysis
#6622 merged
Apr 28, 2025 -
[python] Introduce config module for all env vars/hooks
#6467 merged
Apr 28, 2025 -
Add deduction guide to SmallVector
#6618 merged
Apr 28, 2025 -
Expose max_threads_per_block to avoid launch failures and improve autotune robustness
#6522 merged
Apr 28, 2025 -
Support
None
in getitem slices#6616 merged
Apr 28, 2025 -
[BACKEND] Do not use C++20 designated initializers
#6615 merged
Apr 28, 2025
15 Pull requests opened by 14 people
-
Add `TRITON_IGNORE_LIBTRITON_HASH` for hash stability between Python versions
#6617 opened
Apr 28, 2025 -
[AMD] Remove f8 dtypes in type checking for gfx12
#6628 opened
Apr 28, 2025 -
[Python:compiler] Replace readbytes with MMAP
#6650 opened
Apr 30, 2025 -
[AMD] Remove is within_2gb check for specialization
#6655 opened
Apr 30, 2025 -
[AMD][LLVM] unpack `fmul`/`fadd` near MFMA
#6656 opened
Apr 30, 2025 -
[WIP][DNR] Optimize partitioning and scheduling for attention
#6660 opened
Apr 30, 2025 -
[knobs] Fix environment propagation & scope() API
#6664 opened
May 1, 2025 -
[frontend] Fix autotune cache lookup when interpreter enabled
#6678 opened
May 2, 2025 -
[Cuda] Small test fixes for Consumer Blackwell Cards and Hopper variants
#6680 opened
May 2, 2025 -
Add proton profiling to tutorials 06 and 08
#6685 opened
May 2, 2025 -
[AMD] Optimize to use 128-bit stores in epilogue for CDNA4
#6688 opened
May 2, 2025 -
Aref Automatic Warp Specialization [AutoWS] Implementation
#6689 opened
May 2, 2025 -
[PROTON-DEV] Time shift trick
#6693 opened
May 3, 2025 -
[Backend] Improve how dynamic register reallocation is implemented
#6694 opened
May 3, 2025 -
Add support for masked histograms
#6695 opened
May 3, 2025
5 Issues closed by 5 people
-
Hopper: For TMA load, the layout LHS does not hoist for RS (register-smem) WGMMA
#6646 closed
May 2, 2025 -
Triton matmul_ogs Kernel missing multiplying expert weight
#6527 closed
May 1, 2025 -
ROCm: ImportError: cannot import name 'intel' from 'triton._C.libtriton
#6634 closed
Apr 29, 2025 -
Adding third party TRITON_PLUGIN_DIRS is broken
#6612 closed
Apr 29, 2025 -
errors introduced by scalars in Interpreter mode
#5965 closed
Apr 27, 2025
5 Issues opened by 5 people
-
Ordering of lines matters with num_stages > 1.
#6691 opened
May 2, 2025 -
Reduction is duplicated in TTIR -> TTGIR with num_stages>1 causing strange inconsistencies
#6647 opened
Apr 29, 2025 -
TMA Store Non-deterministic results
#6638 opened
Apr 29, 2025 -
[Feature Request] Print autotune keys with TRITON_PRINT_AUTOTUNING=1
#6636 opened
Apr 29, 2025 -
Triton tl.argmax Accuracy Discrepancy in Top-K Selection, when input contains infinity number
#6635 opened
Apr 29, 2025
14 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[AMD] Add Concat op to AMDGPU dialect
#6590 commented on
May 2, 2025 • 18 new comments -
[LAYOUTS] Generic stmatrix lowering
#6609 commented on
Apr 29, 2025 • 13 new comments -
[BACKEND] BF16 atomic_add support
#6519 commented on
May 3, 2025 • 8 new comments -
Add default pre_run hooks as a JITFunction class variable
#6434 commented on
Apr 28, 2025 • 2 new comments -
[AMD] Relax conditions in ExtractSlice verifier
#6417 commented on
May 1, 2025 • 1 new comment -
xFormers for CUDA 12.8 : AttributeError: Cannot set attribute 'src' directly. Use '_unsafe_update_src()' and manually clear .hash of all callersinstead.
#6123 commented on
Apr 26, 2025 • 0 new comments -
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
#5919 commented on
Apr 30, 2025 • 0 new comments -
tl.dot on transposed matrix tries to rearrange matrix in shared memory
#6569 commented on
Apr 30, 2025 • 0 new comments -
fail instal on arm platform
#6606 commented on
May 1, 2025 • 0 new comments -
Add Structured Tracing Mechanism for Triton Kernel Compilation
#6364 commented on
Apr 30, 2025 • 0 new comments -
[Draft] Enable `atomic_add` for `bf16`
#6418 commented on
Apr 28, 2025 • 0 new comments -
[PROTON] Simplify proton viewer APIs for bench_mlp analysis
#6452 commented on
Apr 30, 2025 • 0 new comments -
[WIP][AMD]Aggregated load of scales in DotScaledOp
#6529 commented on
Apr 29, 2025 • 0 new comments -
[PIPELINE] Enable MMAv5 pipelining for multi-dot loops by default
#6613 commented on
May 2, 2025 • 0 new comments