[TorchToTosa] Add lowering for AtenSortOp by alosim01 · Pull Request #4581 · llvm/torch-mlir

alosim01 · 2026-05-26T13:46:58Z

Summary

Adds TorchToTosa lowering support for torch.aten.sort.

The lowering supports statically-shaped ranked tensors with floating-point element types,
constant dim, and constant descending. It emits repeated tosa.argmax/tosa.gather
selections, masks selected elements with a sentinel, and returns both sorted values and
indices.

This also handles the common decomposed topk pattern where sort results are immediately
prefix-sliced, lowering only the requested prefix instead of sorting the full dimension.

Details

Lowers full aten.sort for ascending and descending order.
Supports sorting along non-last dimensions by transposing to the selection dimension and
transposing back.
Produces i64 indices when required by the Torch result type.
Handles rank-zero tensors by returning the input value and index 0.
Tightens AtenSortOp folding so invalid static dims do not fold incorrectly.
Adds regression tests for full sort, decomposed top-k prefix slicing, rank-zero sort, and
invalid-dim no-fold behavior.

Lallapallooza

Thanks for the patch, few suggestions.

Lallapallooza

Thanks for update, few questions.

Lallapallooza · 2026-06-25T12:40:15Z

@@ -281,15 +281,46 @@ static FailureOr<Value> createIntOrFloatCompareOp(PatternRewriter &rewriter,
  }

  if (isa<mlir::FloatType>(elementType)) {


Can we split the TMTensor comparator change out of this TOSA sort PR.

Lallapallooza · 2026-06-25T13:31:01Z

+  Type elementTy = selfTy.getElementType();
+  if (!elementTy.isF32() && !elementTy.isF16() && !elementTy.isBF16())
+    return rewriter.notifyMatchFailure(
+        op, "only f32, f16, and bf16 element types are supported");
+
+  bool descending;
+  if (!matchPattern(op.getDescending(), m_TorchConstantBool(&descending)))
+    return rewriter.notifyMatchFailure(
+        op, "unimplemented: only constant descending value is supported");
+
+  int64_t dim;
+  if (!matchPattern(op.getDim(), m_TorchConstantInt(&dim)))
+    return rewriter.notifyMatchFailure(
+        op, "unimplemented: only constant dim value is supported");
+
+  int64_t rank = selfTy.getRank();
+  if (rank == 0) {
+    if (dim != 0 && dim != -1)
+      return rewriter.notifyMatchFailure(op, "scalar sort dim is invalid");
+
+    auto indicesTy = cast<RankedTensorType>(
+        getTypeConverter()->convertType(op.getResult(1).getType()));
+    Value indices = tosa::getConstTensor<int32_t>(rewriter, op, 0, {}).value();
+    if (indicesTy.getElementType().isInteger(64))
+      indices = tosa::CastOp::create(rewriter, loc, indicesTy, indices);
+    else if (indices.getType() != indicesTy)
+      indices = tensor::CastOp::create(rewriter, loc, indicesTy, indices);
+    rewriter.replaceOp(op, {self, indices});
+    return success();


The scalar sort fast path is after the float-only dtype check, so rank-zero integer are rejected before reaching the branch that just returns. This path does not need a numeric comparison. Can we move the rank-zero branch before the element-type gate?

Lallapallooza · 2026-06-25T13:34:19Z

+    auto parsePrefixSlice = [&](AtenSliceTensorOp slice,
+                                int64_t &sliceK) -> bool {
+      int64_t sliceDim;
+      if (!matchPattern(slice.getDim(), m_TorchConstantInt(&sliceDim)))
+        return false;
+      sliceDim = toPositiveDim(sliceDim, rank);
+      if (sliceDim != dim)
+        return false;
+      int64_t start;
+      if (!matchPattern(slice.getStart(), m_TorchConstantInt(&start)) ||
+          start != 0)
+        return false;
+      int64_t step;
+      if (!matchPattern(slice.getStep(), m_TorchConstantInt(&step)) ||
+          step != 1)
+        return false;
+      if (!matchPattern(slice.getEnd(), m_TorchConstantInt(&sliceK)))
+        return false;
+      return sliceK > 0 && sliceK <= dimSize;
+    };


topk(k=0) is valid in PyTorch and should return empty values plus empty int64 indices, but this lowering misses that case. After decomposition, topk becomes sort followed by slices ending at 0, and parsePrefixSlice rejects that because it requires sliceK > 0. As a result, large inputs can fall back to full sort and hit the 128-element cap, while smaller inputs still run into the zero-sized-output rejection. Can we add a direct k == 0 fast path + coverage.

Lallapallooza · 2026-06-25T13:36:56Z

+    if (dimInt < 0)
+      dimInt += rank;
+    if (dimInt < 0 || dimInt >= rank)
+      return failure();


Second if always True if first is True, correct?

Lallapallooza · 2026-06-25T13:38:30Z

+    @export
+    @annotate_args([None, ([1, 6], torch.float32, True)])
+    def forward(self, x):
+        return torch.ops.aten.topk(x, k=6, dim=-1, largest=False, sorted=True)


This NaN/Inf smallest-topk test uses k=6 on a length-6 input, so it mostly exercises full ordering rather than partial top-k selection. Can we change this test or add a second one with k < dim_size so the NaN-aware top-k behavior is actually covered?

Lallapallooza · 2026-06-25T13:38:56Z

  SmallVector<mlir::Complex<APFloat>> values;
  for (auto i : llvm::seq<unsigned>(0, matrixType.getDimSize(0))) {
    for (auto j : llvm::seq<unsigned>(0, matrixType.getDimSize(1))) {
      double v = scale * i * j;
      double realV = cos(v);
      double imagV = -sin(v);

      bool unused;
      APFloat real(realV);
      real.convert(floatType.getFloatSemantics(), APFloat::rmNearestTiesToEven,
                   &unused);
      APFloat imag(imagV);
      imag.convert(floatType.getFloatSemantics(), APFloat::rmNearestTiesToEven,
                   &unused);

      values.push_back(mlir::Complex<APFloat>(real, imag));


Why we need these changes?

Lallapallooza · 2026-06-25T13:39:29Z

-  if (dimInt < 0)
-    dimInt += operandType.getSizes().size();
-  if (dimAttribute) {
+  int64_t rank = operandType.getSizes().size();


Could you please explain why old logic was wrong and new is correct?

alosim01 force-pushed the add-aten-sort-op branch from 5fb0d05 to 3a5ccc4 Compare May 27, 2026 09:23

Lallapallooza requested changes May 27, 2026

View reviewed changes

alosim01 force-pushed the add-aten-sort-op branch 2 times, most recently from bca17c9 to 5c4b7ba Compare June 16, 2026 09:04

[TorchToTosa] Add lowering for AtenSortOp

9fad881

alosim01 force-pushed the add-aten-sort-op branch from 5c4b7ba to 9fad881 Compare June 16, 2026 14:03

alosim01 added 4 commits June 17, 2026 12:55

add AtenSortOp NaN-aware ordering

bb249e6

Update ONNX xfails for TopK NaN fixes

900e235

Format TorchToTMTensor TopK comparator

0254220

Xfail large TopK for FX importer TOSA

9cc4729

Lallapallooza requested changes Jun 25, 2026

View reviewed changes

		@@ -281,15 +281,46 @@ static FailureOr<Value> createIntOrFloatCompareOp(PatternRewriter &rewriter,
		}

		if (isa<mlir::FloatType>(elementType)) {

Uh oh!

Conversation

alosim01 commented May 26, 2026

Summary

Details

Uh oh!

Lallapallooza left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lallapallooza left a comment

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Lallapallooza Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants