Aref Automatic Warp Specialization [AutoWS] Implementation #6689

3gx · 2025-05-02T21:18:12Z

Context: In addition to AutoWS implementations in the release/* and main branches, we (see list of contributors below) have been working on our own implementation of AutoWS using aref abstractions, and we would like to share this with the community in a separate branch aref_auto_ws.

Goal: We’re sharing this implementation to facilitate discussion on what abstractions are helpful for automatic warp specialization. We will continue improving this branch to prove out the performance, ergonomics and flexibility for complex workloads that the abstractions here enable.

Simultaneously, we plan to (and have been) taking portions of this branch and porting them into main in third_party/nvidia as they mature and prove their utility on various workloads.

Note: This pull request is primarily for information and visibility, but feedback is appreciated and will be considered as we continue development on this branch.

Contributors to this implementation:
@3gx, @acollins3, @binarybana, @BinFan, @chhzh123, @CliveUnger, @csullivan, @masahi, @mbrookhart, @vinodgro

Jokeren · 2025-05-02T21:35:17Z

lib/Analysis/Membar.cpp

+  auto barrierOp = mlir::insertBarrier(*builder, op->getLoc());
+}
+
+bool MembarAnalysis::isBarrier(Operation *op) {


return isa<gpu::BarrierOp, NVVM::BarrierOp>(op);

Jokeren · 2025-05-02T21:35:45Z

lib/Conversion/TritonGPUToLLVM/MakeRangeOpToLLVM.cpp

@@ -25,7 +26,28 @@ struct MakeRangeOpConversion
    auto elemTy = ty.getElementType();
    assert(elemTy.isInteger(32));
    Value start = createIndexAttrConstant(rewriter, loc, elemTy, op.getStart());
-    auto idxs = emitIndices(loc, rewriter, targetInfo, layout, ty, true);
+    std::optional<int> warpGroupStart;
+    if (!getWarpGroupStart(rewriter.getInsertionBlock())) {


This is quite hacky

This is work-around for limited capability of our partitioner that doesn't put make_range op into a warp_group region. We actually like Meta's partitioner, and hopefully we can transition to it, and this issue get resolved, and this work-around won't be needed.

Aref AutoWS Implementation for Triton

7ff8e01

3gx requested review from ptillet and Jokeren as code owners May 2, 2025 21:18

Jokeren reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aref Automatic Warp Specialization [AutoWS] Implementation #6689

Aref Automatic Warp Specialization [AutoWS] Implementation #6689

3gx commented May 2, 2025

Jokeren May 2, 2025

Jokeren May 2, 2025

3gx May 2, 2025

Aref Automatic Warp Specialization [AutoWS] Implementation #6689

Are you sure you want to change the base?

Aref Automatic Warp Specialization [AutoWS] Implementation #6689

Conversation

3gx commented May 2, 2025

Jokeren May 2, 2025

Choose a reason for hiding this comment

Jokeren May 2, 2025

Choose a reason for hiding this comment

3gx May 2, 2025

Choose a reason for hiding this comment