Skip to content

Aref Automatic Warp Specialization [AutoWS] Implementation #6689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: aref_auto_ws
Choose a base branch
from

Conversation

3gx
Copy link

@3gx 3gx commented May 2, 2025

Context: In addition to AutoWS implementations in the release/* and main branches, we (see list of contributors below) have been working on our own implementation of AutoWS using aref abstractions, and we would like to share this with the community in a separate branch aref_auto_ws.

Goal: We’re sharing this implementation to facilitate discussion on what abstractions are helpful for automatic warp specialization. We will continue improving this branch to prove out the performance, ergonomics and flexibility for complex workloads that the abstractions here enable.

Simultaneously, we plan to (and have been) taking portions of this branch and porting them into main in third_party/nvidia as they mature and prove their utility on various workloads.

Note: This pull request is primarily for information and visibility, but feedback is appreciated and will be considered as we continue development on this branch.

Contributors to this implementation:
@3gx, @acollins3, @binarybana, @BinFan, @chhzh123, @CliveUnger, @csullivan, @masahi, @mbrookhart, @vinodgro

@3gx 3gx requested review from ptillet and Jokeren as code owners May 2, 2025 21:18
auto barrierOp = mlir::insertBarrier(*builder, op->getLoc());
}

bool MembarAnalysis::isBarrier(Operation *op) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return isa<gpu::BarrierOp, NVVM::BarrierOp>(op);

@@ -25,7 +26,28 @@ struct MakeRangeOpConversion
auto elemTy = ty.getElementType();
assert(elemTy.isInteger(32));
Value start = createIndexAttrConstant(rewriter, loc, elemTy, op.getStart());
auto idxs = emitIndices(loc, rewriter, targetInfo, layout, ty, true);
std::optional<int> warpGroupStart;
if (!getWarpGroupStart(rewriter.getInsertionBlock())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite hacky

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is work-around for limited capability of our partitioner that doesn't put make_range op into a warp_group region. We actually like Meta's partitioner, and hopefully we can transition to it, and this issue get resolved, and this work-around won't be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants