ObjectFifo lowering fixes (Xilinx#291)

* Improvements to tutorial README. * Modified diagrams in tutorials 1,3,4,5. * Adjusted diagram sizes. * ObjectFifo fixes. * Clang format patch.
arkhodamoradi · Jan 26, 2023 · 8940958 · 8940958
1 parent 1a9835c
commit 8940958
Show file tree

Hide file tree

Showing 3 changed files with 131 additions and 20 deletions.
diff --git a/lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp b/lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp
@@ -168,6 +168,11 @@ struct AIEObjectFifoStatefulTransformPass
   ///   * 1 if it is that of the second input tile,
   ///   * 0 is no memory module is shared.
   bool isSharedMemory(TileOp a, TileOp b, int *share_direction) {
+    if (a.isShimTile() || b.isShimTile()) {
+      *share_direction = 0;
+      return false;
+    }
+
     bool rightShared = isLegalMemAffinity(a.colIndex(), a.rowIndex(),
                                           b.colIndex(), b.rowIndex());
 
@@ -885,7 +890,7 @@ struct AIEObjectFifoStatefulTransformPass
     }
 
     //===----------------------------------------------------------------------===//
-    // Create multicast and tile DMAs
+    // Create flows and tile DMAs
     //===----------------------------------------------------------------------===//
     for (auto [producer, consumers] : splitFifos) {
       // create producer tile DMA
@@ -894,14 +899,14 @@ struct AIEObjectFifoStatefulTransformPass
       createDMA(m, builder, producer, producerChan.first, producerChan.second,
                 0);
 
-      // create multicast
       for (auto consumer : consumers) {
         // create consumer tile DMA
         xilinx::AIE::DMAChannel consumerChan =
             dmaAnalysis.getSlaveDMAChannel(consumer.getProducerTile());
         createDMA(m, builder, consumer, consumerChan.first, consumerChan.second,
                   1);
 
+        // create flow
         builder.setInsertionPointAfter(producer);
         builder.create<FlowOp>(builder.getUnknownLoc(),
                                producer.getProducerTile(), WireBundle::DMA,
@@ -926,9 +931,9 @@ struct AIEObjectFifoStatefulTransformPass
           acquiresPerFifo; // maps each objFifo to indices of buffers acquired
                            // in latest subview of that objFifo (useful to
                            // cascade acquired elements to next AcquireOp)
-      std::vector<ObjectFifoReleaseOp>
+      DenseMap<ObjectFifoCreateOp, std::vector<ObjectFifoReleaseOp>>
           releaseOps; // useful to check which ReleaseOp has taken place before
-                      // an AcquireOp
+                      // an AcquireOp per objFifo
       DenseMap<ObjectFifoCreateOp, int>
           acqPerFifo; // maps each objFifo to its next index to acquire within
                       // this CoreOp
@@ -959,8 +964,13 @@ struct AIEObjectFifoStatefulTransformPass
         createUseLocks(builder, op, relPerFifo, numLocks, lockMode,
                        LockAction::Release);
 
-        // add release op to list
-        releaseOps.push_back(releaseOp);
+        // register release op
+        if (releaseOps.find(op) != releaseOps.end())
+          releaseOps[op].push_back(releaseOp);
+        else {
+          std::vector<ObjectFifoReleaseOp> release = {releaseOp};
+          releaseOps[op] = release;
+        }
       });
 
       //===----------------------------------------------------------------------===//
@@ -985,7 +995,7 @@ struct AIEObjectFifoStatefulTransformPass
         // check how many elements have been released in between this AcquireOp
         // and the previous one
         int numRel = 0;
-        for (auto relOp : releaseOps) {
+        for (auto relOp : releaseOps[op]) {
           ObjectFifoCreateOp otherOp =
               relOp.getFifo().getDefiningOp<ObjectFifoCreateOp>();
           // TODO: operations may not be in the same block: currently only
@@ -994,10 +1004,10 @@ struct AIEObjectFifoStatefulTransformPass
             if (acquireOp.getOperation()->getBlock() ==
                 relOp.getOperation()->getBlock()) {
               if (!acquireOp->isBeforeInBlock(relOp)) {
-                releaseOps.erase(
-                    releaseOps.begin()); // to ensure that we do not account the
-                                         // ReleaseOps again later, after the
-                                         // subview is created
+                releaseOps[op].erase(
+                    releaseOps[op].begin()); // to ensure that we do not account
+                                             // the ReleaseOps again later,
+                                             // after the subview is created
                 numRel += relOp.relNumber();
               }
             } else {
@@ -1006,10 +1016,11 @@ struct AIEObjectFifoStatefulTransformPass
               if (relOp.getOperation()->getBlock() ==
                   acqBlockDefOp->getBlock()) {
                 if (!acqBlockDefOp->isBeforeInBlock(relOp)) {
-                  releaseOps.erase(
-                      releaseOps.begin()); // to ensure that we do not account
-                                           // the ReleaseOps again later, after
-                                           // the subview is created
+                  releaseOps[op].erase(
+                      releaseOps[op]
+                          .begin()); // to ensure that we do not account
+                                     // the ReleaseOps again later, after
+                                     // the subview is created
                   numRel += relOp.relNumber();
                 }
               } else {
@@ -1018,10 +1029,11 @@ struct AIEObjectFifoStatefulTransformPass
                 if (acquireOp.getOperation()->getBlock() ==
                     relBlockDefOp->getBlock()) {
                   if (!acquireOp->isBeforeInBlock(relBlockDefOp)) {
-                    releaseOps.erase(
-                        releaseOps.begin()); // to ensure that we do not account
-                                             // the ReleaseOps again later,
-                                             // after the subview is created
+                    releaseOps[op].erase(
+                        releaseOps[op]
+                            .begin()); // to ensure that we do not account
+                                       // the ReleaseOps again later,
+                                       // after the subview is created
                     numRel += relOp.relNumber();
                   }
                 }

diff --git a/test/objectFifo-stateful-transform/shimRow_mem_test.mlir b/test/objectFifo-stateful-transform/shimRow_mem_test.mlir
@@ -0,0 +1,99 @@
+//===- shimRow_mem_test.mlir --------------------------*- MLIR -*-===//
+//
+// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+// Copyright (C) 2022, Xilinx Inc.
+// Copyright (C) 2022, Advanced Micro Devices, Inc.
+//
+// Date: January 26th 2023
+// 
+//===----------------------------------------------------------------------===//
+
+// RUN: aie-opt --aie-objectFifo-stateful-transform %s | FileCheck %s
+
+// CHECK: module @shimRow_mem {
+// CHECK:   %0 = AIE.tile(7, 1)
+// CHECK:   %1 = AIE.tile(7, 0)
+// CHECK:   AIE.flow(%1, DMA : 0, %0, DMA : 0)
+// CHECK:   %2 = AIE.lock(%1, 0) {sym_name = "of_0_lock_0"}
+// CHECK:   %3 = AIE.buffer(%0) {sym_name = "of_1_buff_0"} : memref<16xi32>
+// CHECK:   %4 = AIE.lock(%0, 0) {sym_name = "of_1_lock_0"}
+// CHECK:   %5 = AIE.buffer(%0) {sym_name = "of_1_buff_1"} : memref<16xi32>
+// CHECK:   %6 = AIE.lock(%0, 1) {sym_name = "of_1_lock_1"}
+// CHECK:   %7 = AIE.buffer(%0) {sym_name = "of_1_buff_2"} : memref<16xi32>
+// CHECK:   %8 = AIE.lock(%0, 2) {sym_name = "of_1_lock_2"}
+// CHECK:   %9 = AIE.external_buffer {sym_name = "ext_buffer_in"} : memref<64xi32>
+// CHECK:   func.func @some_work(%arg0: memref<16xi32>, %arg1: memref<16xi32>) {
+// CHECK:     return
+// CHECK:   }
+// CHECK:   %10 = AIE.core(%0) {
+// CHECK:     %c0 = arith.constant 0 : index
+// CHECK:     %c1 = arith.constant 1 : index
+// CHECK:     %c12 = arith.constant 12 : index
+// CHECK:     AIE.useLock(%4, Acquire, 1)
+// CHECK:     AIE.useLock(%6, Acquire, 1)
+// CHECK:     func.call @some_work(%3, %5) : (memref<16xi32>, memref<16xi32>) -> ()
+// CHECK:     AIE.useLock(%4, Release, 0)
+// CHECK:     AIE.end
+// CHECK:   }
+// CHECK:   %11 = AIE.shimDMA(%1) {
+// CHECK:     %13 = AIE.dmaStart(MM2S, 0, ^bb1, ^bb2)
+// CHECK:   ^bb1:  // 2 preds: ^bb0, ^bb1
+// CHECK:     AIE.useLock(%2, Acquire, 1)
+// CHECK:     AIE.dmaBd(<%9 : memref<64xi32>, 0, 64>, 0)
+// CHECK:     AIE.useLock(%2, Release, 0)
+// CHECK:     AIE.nextBd ^bb1
+// CHECK:   ^bb2:  // pred: ^bb0
+// CHECK:     AIE.end
+// CHECK:   }
+// CHECK:   %12 = AIE.mem(%0) {
+// CHECK:     %13 = AIE.dmaStart(S2MM, 0, ^bb1, ^bb4)
+// CHECK:   ^bb1:  // 2 preds: ^bb0, ^bb3
+// CHECK:     AIE.useLock(%4, Acquire, 0)
+// CHECK:     AIE.dmaBd(<%3 : memref<16xi32>, 0, 16>, 0)
+// CHECK:     AIE.useLock(%4, Release, 1)
+// CHECK:     AIE.nextBd ^bb2
+// CHECK:   ^bb2:  // pred: ^bb1
+// CHECK:     AIE.useLock(%6, Acquire, 0)
+// CHECK:     AIE.dmaBd(<%5 : memref<16xi32>, 0, 16>, 0)
+// CHECK:     AIE.useLock(%6, Release, 1)
+// CHECK:     AIE.nextBd ^bb3
+// CHECK:   ^bb3:  // pred: ^bb2
+// CHECK:     AIE.useLock(%8, Acquire, 0)
+// CHECK:     AIE.dmaBd(<%7 : memref<16xi32>, 0, 16>, 0)
+// CHECK:     AIE.useLock(%8, Release, 1)
+// CHECK:     AIE.nextBd ^bb1
+// CHECK:   ^bb4:  // pred: ^bb0
+// CHECK:     AIE.end
+// CHECK:   }
+// CHECK: }
+
+module @shimRow_mem {
+    %tile71 = AIE.tile(7, 1)
+    %tile70 = AIE.tile(7, 0)
+
+    %objFifo = AIE.objectFifo.createObjectFifo(%tile70, {%tile71}, 3) : !AIE.objectFifo<memref<16xi32>>
+
+    %ext_buffer_in  = AIE.external_buffer {sym_name = "ext_buffer_in"}: memref<64xi32>
+    AIE.objectFifo.registerExternalBuffers(%tile70, %objFifo : !AIE.objectFifo<memref<16xi32>>, {%ext_buffer_in}) : (memref<64xi32>)
+
+    func.func @some_work(%a : memref<16xi32>, %b : memref<16xi32>) -> () {
+        return
+    }
+
+    %core71 = AIE.core(%tile71) {
+        %c0 = arith.constant 0 : index
+        %c1 = arith.constant 1 : index
+        %height = arith.constant 12 : index
+
+        %subview = AIE.objectFifo.acquire<Consume>(%objFifo : !AIE.objectFifo<memref<16xi32>>, 2) : !AIE.objectFifoSubview<memref<16xi32>>
+        %elem0 = AIE.objectFifo.subview.access %subview[0] : !AIE.objectFifoSubview<memref<16xi32>> -> memref<16xi32>
+        %elem1 = AIE.objectFifo.subview.access %subview[1] : !AIE.objectFifoSubview<memref<16xi32>> -> memref<16xi32>
+        func.call @some_work(%elem0, %elem1) : (memref<16xi32>, memref<16xi32>) -> ()
+        AIE.objectFifo.release<Consume>(%objFifo : !AIE.objectFifo<memref<16xi32>>, 1)
+
+        AIE.end
+    }
+}
diff --git a/tutorials/README.md b/tutorials/README.md
@@ -19,7 +19,7 @@ The context that the `mlir-aie` dialect sits with respect to other MLIR dialects
 
 Here, we see that `mlir-aie` is part of a larger ecosystem of open source dialects that allows customized tool development targeting AMD devices. `mlir-aie` can be used to generate low-level configuration for the AIEngine portion of Versal devices, including processors, stream switches, TileDMA and ShimDMA blocks. Backend code generation is included, targeting the LibXAIE library. In the tutorial examples, the configuration is used by host code that executes the generated configuration, and makes use of APIs in the [mlir-aie/runtime_lib](https://github.com/Xilinx/mlir-aie/tree/main/runtime_lib) directory to interface with the design.
 
-This design tutorial will help guide someone new to MLIR through the steps of building increasingly complex multi-core designs. In order to understand this MLIR-based representation for AI Engine design, it is important to first understand overall AI Engine architecture.
+This design tutorial will help guide someone new to MLIR through the steps of building increasingly complex multi-core designs. It can be consumned in different ways, depending on the user's end-goal. For a detailed walkthrough of the `mlir-aie` dialect at a physical-level, the tutorials should be followed in order and all the subdirectories of each tutorial should be read. For a more high-level description of the dialect the tutorials should still be followed in order, but reading only the first 5 tutorials will suffice to start building multi-core designs at a higher abstraction level without in-depth understanding of the low-level details. The subdirectories in each tutorial can thus be skipped (with the exception of tutorial-3 which introduces the high-level abstraction of the `mlir-aie` dialect). 
 
 The individual tutorials are listed below along with the AI Engine architecture topics they cover. Following this is a
 a more detailed description of the architecture, ending with an overview of how each tutorial maps onto it.