[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

charithaintc · 2025-06-03T23:00:38Z

Changes:

Decouple layout propagation from subgroup distribution and move it to an independent pass.
Refine layout assignment to handle control-flow ops correctly (scf.for).
Refine test cases.

charithaintc · 2025-06-05T21:19:16Z

@Garra1980 @Jianhui-Li Can you please take a look?

charithaintc · 2025-06-05T22:48:47Z

mlir/lib/Dialect/XeGPU/Transforms/XeGPULayoutPropagate.cpp

+    storeScatterLayout = valueLayout.getTransposedLayout({1, 0});
+  }
+  // Propagate the value layout.
+  propagateIfChanged(operands[0], operands[0]->meet(valueLayout));
+  // Propagate the tensor descriptor layout.
+  propagateIfChanged(operands[1], operands[1]->meet(storeScatterLayout));
+  // Use default 1D layout for mask operand.
+  LayoutInfo maskLayout = getDefaultLayoutInfo(1);
+  propagateIfChanged(operands[2], operands[2]->meet(maskLayout));
+}
+
+namespace {
+
+//===----------------------------------------------------------------------===//
+// RunLayoutInfoPropagation
+//===----------------------------------------------------------------------===//
+
+/// Driver class for running the LayoutInfoPropagation analysis.
+class RunLayoutInfoPropagation {
+public:
+  MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(RunLayoutInfoPropagation)
+
+  RunLayoutInfoPropagation(Operation *op) : target(op) {
+    SymbolTableCollection symbolTable;
+    solver.load<DeadCodeAnalysis>();
+    solver.load<SparseConstantPropagation>();
+    solver.load<LayoutInfoPropagation>(symbolTable);
+    (void)solver.initializeAndRun(op);
+  }
+
+  LayoutInfo getLayoutInfo(Value val);
+
+  void printAnalysisResult(llvm::raw_ostream &os);
+
+private:
+  DataFlowSolver solver;
+  const Operation *target;
+};
+} // namespace
+
+LayoutInfo RunLayoutInfoPropagation::getLayoutInfo(Value val) {
+  auto *state = solver.lookupState<LayoutInfoLattice>(val);
+  if (!state)
+    return {};
+  return state->getValue();
+}
+
+// Print the analysis result for debugging purposes.
+[[maybe_unused]] void
+RunLayoutInfoPropagation::printAnalysisResult(llvm::raw_ostream &os) {
+  auto printFunctionResult = [&](FunctionOpInterface funcOp) {
+    os << "function: " << funcOp.getName() << ":\n";
+    // Function arguments
+    for (BlockArgument arg : funcOp.getArguments()) {
+      LayoutInfo layout = getLayoutInfo(arg);
+      os << "argument: " << arg << "\n";
+      os << "layout  : ";
+      layout.print(os);
+      os << "\n";
+    }
+    // Function ops
+    funcOp.walk([&](Operation *op) {
+      // Skip ops that do not have results
+      if (op->getResults().empty())
+        return;
+      os << "op    : ";
+      // For control-flow ops, print the op name only.
+      if (isa<BranchOpInterface>(op) || isa<RegionBranchOpInterface>(op))
+        os << op->getName();
+      else
+        op->print(os);
+      os << "\n";
+      // Print the layout for each result.
+      for (auto [i, r] : llvm::enumerate(op->getResults())) {
+        LayoutInfo layout = getLayoutInfo(r);
+        os << "layout for result #" << i << ": ";
+        layout.print(os);
+        os << "\n";
+      }
+    });
+  };
+
+  SmallVector<FunctionOpInterface> funcOps;
+  if (auto modOp = dyn_cast<ModuleOp>(target)) {
+    for (auto funcOp : modOp.getOps<FunctionOpInterface>()) {
+      funcOps.push_back(funcOp);
+    }
+    // Collect all GpuFuncOps in the module.
+    for (auto gpuModOp : modOp.getOps<gpu::GPUModuleOp>()) {
+      for (auto gpuFuncOp : gpuModOp.getOps<FunctionOpInterface>()) {
+        funcOps.push_back(gpuFuncOp);
+      }
+    }
+  }
+  // Print the analysis result for each function.
+  for (FunctionOpInterface funcOp : funcOps) {
+    printFunctionResult(funcOp);
+  }
+}
+


this part of the code does not require review because it is a copy-paste into a new file.

Jianhui-Li · 2025-06-06T02:12:17Z

mlir/test/Dialect/XeGPU/layout-propagate.mlir

+// CHECK-LABEL: func.func @binary_op_multiple_uses(
+// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]+]]: !xegpu.tensor_desc<8x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>, %[[ARG1:[0-9a-zA-Z]+]]: !xegpu.tensor_desc<16x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>, %[[ARG2:[0-9a-zA-Z]+]]: !xegpu.tensor_desc<8x16xf32, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>, %[[ARG3:[0-9a-zA-Z]+]]: !xegpu.tensor_desc<16x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>) {
+// CHECK: %[[T2:.*]] = arith.addf %{{.*}}, %{{.*}} {layout_operand_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, layout_operand_1 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>} : vector<16x16xf16>
+// CHECK: %[[T3:.*]] = xegpu.dpas %{{.*}}, %[[T2]] {layout_operand_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, layout_operand_1 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, layout_result_0 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>} : vector<8x16xf16>, vector<16x16xf16> -> vector<8x16xf32>


lane_data is [1, 1] for B as well?

yes. store_nd sets the layout for %2. so it is set to lane_layout = [1, 16], lane_data = [1, 1]> (because of backward analysis). this example will need a convert_layout.

I get your point now. current code does not handle layout conflicts. it naively attach the layout assigned by the backward analysis.

this need some improvement. maybe next PR?

I am fine with separate PR.

Jianhui-Li · 2025-06-06T02:15:58Z

mlir/test/Dialect/XeGPU/layout-propagate.mlir

+// CHECK-NEXT:   %[[T7:.*]] = xegpu.update_nd_offset %[[ARG4]], [{{.*}}] : !xegpu.tensor_desc<8x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>
+// CHECK-NEXT:   %[[T8:.*]] = xegpu.update_nd_offset %[[ARG5]], [{{.*}}] : !xegpu.tensor_desc<16x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [2, 1]>>
+// CHECK-NEXT:   scf.yield {layout_operand_2 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>} %[[T7]], %[[T8]], %[[T6]] : !xegpu.tensor_desc<8x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>>, !xegpu.tensor_desc<16x16xf16, #xegpu.layout<lane_layout = [1, 16], lane_data = [2, 1]>>, vector<8x16xf32>
+// CHECK-NEXT: } {layout_operand_5 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>, layout_result_2 = #xegpu.layout<lane_layout = [1, 16], lane_data = [1, 1]>}


what are these layouts for? The loop output three variables and seems not matching these layouts.

5th (0 indexed) input to the loop is the C init value. and second output (0 indexed) is C output value.

charithaintc added 25 commits May 27, 2025 23:40

add bug fix

ff1012e

add test

c6eb53f

add comments

3bdb596

Merge branch 'main' into scf_for_bug

6d47e3f

remove unsused headers

fe3ab99

save work

f91b64c

Merge branch 'main' into scf_for_bug

cc621a1

Merge branch 'scf_for_bug' into fix_layout_assign

76f7d98

initial version

5cacace

working version

7d54194

working expect for unreal cast

b289399

some fixes

4318343

branch terminator iface

20a6415

save work

7bd0be2

working

00dc2b6

move out layout prop

35620ec

fix test

92c23f1

fix names

7b69082

func op iface support

5669616

fix test

71902aa

fix test

341daff

revert merge

fdacb63

add comment

57acc9e

fix

d7eaaa5

refactor

a99ee75

charithaintc marked this pull request as ready for review June 5, 2025 21:15

charithaintc requested review from nbpatel, chencha3, adam-smnk and fschlimb June 5, 2025 21:15

charithaintc added 2 commits June 5, 2025 21:26

refactor

739aad7

refactor

76b7333

charithaintc commented Jun 5, 2025

View reviewed changes

Jianhui-Li reviewed Jun 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

charithaintc commented Jun 3, 2025 •

edited

Loading

Uh oh!

charithaintc commented Jun 5, 2025

Uh oh!

charithaintc Jun 5, 2025 •

edited

Loading

Uh oh!

Jianhui-Li Jun 6, 2025

Uh oh!

charithaintc Jun 6, 2025

Uh oh!

charithaintc Jun 6, 2025

Uh oh!

Jianhui-Li Jun 6, 2025

Uh oh!

Jianhui-Li Jun 6, 2025

Uh oh!

charithaintc Jun 6, 2025

Uh oh!

Uh oh!

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

Are you sure you want to change the base?

[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. #142687

Conversation

charithaintc commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charithaintc commented Jun 5, 2025

Uh oh!

charithaintc Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Jianhui-Li Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

charithaintc Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charithaintc commented Jun 3, 2025 •

edited

Loading

charithaintc Jun 5, 2025 •

edited

Loading