Skip to content

Commit

Permalink
ObjectFifo lowering fixes (Xilinx#291)
Browse files Browse the repository at this point in the history
* Improvements to tutorial README.

* Modified diagrams in tutorials 1,3,4,5.

* Adjusted diagram sizes.

* ObjectFifo fixes.

* Clang format patch.
  • Loading branch information
AndraBisca authored Jan 26, 2023
1 parent 1a9835c commit 8940958
Show file tree
Hide file tree
Showing 3 changed files with 131 additions and 20 deletions.
50 changes: 31 additions & 19 deletions lib/Dialect/AIE/Transforms/AIEObjectFifoStatefulTransform.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,11 @@ struct AIEObjectFifoStatefulTransformPass
/// * 1 if it is that of the second input tile,
/// * 0 is no memory module is shared.
bool isSharedMemory(TileOp a, TileOp b, int *share_direction) {
if (a.isShimTile() || b.isShimTile()) {
*share_direction = 0;
return false;
}

bool rightShared = isLegalMemAffinity(a.colIndex(), a.rowIndex(),
b.colIndex(), b.rowIndex());

Expand Down Expand Up @@ -885,7 +890,7 @@ struct AIEObjectFifoStatefulTransformPass
}

//===----------------------------------------------------------------------===//
// Create multicast and tile DMAs
// Create flows and tile DMAs
//===----------------------------------------------------------------------===//
for (auto [producer, consumers] : splitFifos) {
// create producer tile DMA
Expand All @@ -894,14 +899,14 @@ struct AIEObjectFifoStatefulTransformPass
createDMA(m, builder, producer, producerChan.first, producerChan.second,
0);

// create multicast
for (auto consumer : consumers) {
// create consumer tile DMA
xilinx::AIE::DMAChannel consumerChan =
dmaAnalysis.getSlaveDMAChannel(consumer.getProducerTile());
createDMA(m, builder, consumer, consumerChan.first, consumerChan.second,
1);

// create flow
builder.setInsertionPointAfter(producer);
builder.create<FlowOp>(builder.getUnknownLoc(),
producer.getProducerTile(), WireBundle::DMA,
Expand All @@ -926,9 +931,9 @@ struct AIEObjectFifoStatefulTransformPass
acquiresPerFifo; // maps each objFifo to indices of buffers acquired
// in latest subview of that objFifo (useful to
// cascade acquired elements to next AcquireOp)
std::vector<ObjectFifoReleaseOp>
DenseMap<ObjectFifoCreateOp, std::vector<ObjectFifoReleaseOp>>
releaseOps; // useful to check which ReleaseOp has taken place before
// an AcquireOp
// an AcquireOp per objFifo
DenseMap<ObjectFifoCreateOp, int>
acqPerFifo; // maps each objFifo to its next index to acquire within
// this CoreOp
Expand Down Expand Up @@ -959,8 +964,13 @@ struct AIEObjectFifoStatefulTransformPass
createUseLocks(builder, op, relPerFifo, numLocks, lockMode,
LockAction::Release);

// add release op to list
releaseOps.push_back(releaseOp);
// register release op
if (releaseOps.find(op) != releaseOps.end())
releaseOps[op].push_back(releaseOp);
else {
std::vector<ObjectFifoReleaseOp> release = {releaseOp};
releaseOps[op] = release;
}
});

//===----------------------------------------------------------------------===//
Expand All @@ -985,7 +995,7 @@ struct AIEObjectFifoStatefulTransformPass
// check how many elements have been released in between this AcquireOp
// and the previous one
int numRel = 0;
for (auto relOp : releaseOps) {
for (auto relOp : releaseOps[op]) {
ObjectFifoCreateOp otherOp =
relOp.getFifo().getDefiningOp<ObjectFifoCreateOp>();
// TODO: operations may not be in the same block: currently only
Expand All @@ -994,10 +1004,10 @@ struct AIEObjectFifoStatefulTransformPass
if (acquireOp.getOperation()->getBlock() ==
relOp.getOperation()->getBlock()) {
if (!acquireOp->isBeforeInBlock(relOp)) {
releaseOps.erase(
releaseOps.begin()); // to ensure that we do not account the
// ReleaseOps again later, after the
// subview is created
releaseOps[op].erase(
releaseOps[op].begin()); // to ensure that we do not account
// the ReleaseOps again later,
// after the subview is created
numRel += relOp.relNumber();
}
} else {
Expand All @@ -1006,10 +1016,11 @@ struct AIEObjectFifoStatefulTransformPass
if (relOp.getOperation()->getBlock() ==
acqBlockDefOp->getBlock()) {
if (!acqBlockDefOp->isBeforeInBlock(relOp)) {
releaseOps.erase(
releaseOps.begin()); // to ensure that we do not account
// the ReleaseOps again later, after
// the subview is created
releaseOps[op].erase(
releaseOps[op]
.begin()); // to ensure that we do not account
// the ReleaseOps again later, after
// the subview is created
numRel += relOp.relNumber();
}
} else {
Expand All @@ -1018,10 +1029,11 @@ struct AIEObjectFifoStatefulTransformPass
if (acquireOp.getOperation()->getBlock() ==
relBlockDefOp->getBlock()) {
if (!acquireOp->isBeforeInBlock(relBlockDefOp)) {
releaseOps.erase(
releaseOps.begin()); // to ensure that we do not account
// the ReleaseOps again later,
// after the subview is created
releaseOps[op].erase(
releaseOps[op]
.begin()); // to ensure that we do not account
// the ReleaseOps again later,
// after the subview is created
numRel += relOp.relNumber();
}
}
Expand Down
99 changes: 99 additions & 0 deletions test/objectFifo-stateful-transform/shimRow_mem_test.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
//===- shimRow_mem_test.mlir --------------------------*- MLIR -*-===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// Copyright (C) 2022, Xilinx Inc.
// Copyright (C) 2022, Advanced Micro Devices, Inc.
//
// Date: January 26th 2023
//
//===----------------------------------------------------------------------===//

// RUN: aie-opt --aie-objectFifo-stateful-transform %s | FileCheck %s

// CHECK: module @shimRow_mem {
// CHECK: %0 = AIE.tile(7, 1)
// CHECK: %1 = AIE.tile(7, 0)
// CHECK: AIE.flow(%1, DMA : 0, %0, DMA : 0)
// CHECK: %2 = AIE.lock(%1, 0) {sym_name = "of_0_lock_0"}
// CHECK: %3 = AIE.buffer(%0) {sym_name = "of_1_buff_0"} : memref<16xi32>
// CHECK: %4 = AIE.lock(%0, 0) {sym_name = "of_1_lock_0"}
// CHECK: %5 = AIE.buffer(%0) {sym_name = "of_1_buff_1"} : memref<16xi32>
// CHECK: %6 = AIE.lock(%0, 1) {sym_name = "of_1_lock_1"}
// CHECK: %7 = AIE.buffer(%0) {sym_name = "of_1_buff_2"} : memref<16xi32>
// CHECK: %8 = AIE.lock(%0, 2) {sym_name = "of_1_lock_2"}
// CHECK: %9 = AIE.external_buffer {sym_name = "ext_buffer_in"} : memref<64xi32>
// CHECK: func.func @some_work(%arg0: memref<16xi32>, %arg1: memref<16xi32>) {
// CHECK: return
// CHECK: }
// CHECK: %10 = AIE.core(%0) {
// CHECK: %c0 = arith.constant 0 : index
// CHECK: %c1 = arith.constant 1 : index
// CHECK: %c12 = arith.constant 12 : index
// CHECK: AIE.useLock(%4, Acquire, 1)
// CHECK: AIE.useLock(%6, Acquire, 1)
// CHECK: func.call @some_work(%3, %5) : (memref<16xi32>, memref<16xi32>) -> ()
// CHECK: AIE.useLock(%4, Release, 0)
// CHECK: AIE.end
// CHECK: }
// CHECK: %11 = AIE.shimDMA(%1) {
// CHECK: %13 = AIE.dmaStart(MM2S, 0, ^bb1, ^bb2)
// CHECK: ^bb1: // 2 preds: ^bb0, ^bb1
// CHECK: AIE.useLock(%2, Acquire, 1)
// CHECK: AIE.dmaBd(<%9 : memref<64xi32>, 0, 64>, 0)
// CHECK: AIE.useLock(%2, Release, 0)
// CHECK: AIE.nextBd ^bb1
// CHECK: ^bb2: // pred: ^bb0
// CHECK: AIE.end
// CHECK: }
// CHECK: %12 = AIE.mem(%0) {
// CHECK: %13 = AIE.dmaStart(S2MM, 0, ^bb1, ^bb4)
// CHECK: ^bb1: // 2 preds: ^bb0, ^bb3
// CHECK: AIE.useLock(%4, Acquire, 0)
// CHECK: AIE.dmaBd(<%3 : memref<16xi32>, 0, 16>, 0)
// CHECK: AIE.useLock(%4, Release, 1)
// CHECK: AIE.nextBd ^bb2
// CHECK: ^bb2: // pred: ^bb1
// CHECK: AIE.useLock(%6, Acquire, 0)
// CHECK: AIE.dmaBd(<%5 : memref<16xi32>, 0, 16>, 0)
// CHECK: AIE.useLock(%6, Release, 1)
// CHECK: AIE.nextBd ^bb3
// CHECK: ^bb3: // pred: ^bb2
// CHECK: AIE.useLock(%8, Acquire, 0)
// CHECK: AIE.dmaBd(<%7 : memref<16xi32>, 0, 16>, 0)
// CHECK: AIE.useLock(%8, Release, 1)
// CHECK: AIE.nextBd ^bb1
// CHECK: ^bb4: // pred: ^bb0
// CHECK: AIE.end
// CHECK: }
// CHECK: }

module @shimRow_mem {
%tile71 = AIE.tile(7, 1)
%tile70 = AIE.tile(7, 0)

%objFifo = AIE.objectFifo.createObjectFifo(%tile70, {%tile71}, 3) : !AIE.objectFifo<memref<16xi32>>

%ext_buffer_in = AIE.external_buffer {sym_name = "ext_buffer_in"}: memref<64xi32>
AIE.objectFifo.registerExternalBuffers(%tile70, %objFifo : !AIE.objectFifo<memref<16xi32>>, {%ext_buffer_in}) : (memref<64xi32>)

func.func @some_work(%a : memref<16xi32>, %b : memref<16xi32>) -> () {
return
}

%core71 = AIE.core(%tile71) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%height = arith.constant 12 : index

%subview = AIE.objectFifo.acquire<Consume>(%objFifo : !AIE.objectFifo<memref<16xi32>>, 2) : !AIE.objectFifoSubview<memref<16xi32>>
%elem0 = AIE.objectFifo.subview.access %subview[0] : !AIE.objectFifoSubview<memref<16xi32>> -> memref<16xi32>
%elem1 = AIE.objectFifo.subview.access %subview[1] : !AIE.objectFifoSubview<memref<16xi32>> -> memref<16xi32>
func.call @some_work(%elem0, %elem1) : (memref<16xi32>, memref<16xi32>) -> ()
AIE.objectFifo.release<Consume>(%objFifo : !AIE.objectFifo<memref<16xi32>>, 1)

AIE.end
}
}
2 changes: 1 addition & 1 deletion tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The context that the `mlir-aie` dialect sits with respect to other MLIR dialects

Here, we see that `mlir-aie` is part of a larger ecosystem of open source dialects that allows customized tool development targeting AMD devices. `mlir-aie` can be used to generate low-level configuration for the AIEngine portion of Versal devices, including processors, stream switches, TileDMA and ShimDMA blocks. Backend code generation is included, targeting the LibXAIE library. In the tutorial examples, the configuration is used by host code that executes the generated configuration, and makes use of APIs in the [mlir-aie/runtime_lib](https://github.com/Xilinx/mlir-aie/tree/main/runtime_lib) directory to interface with the design.

This design tutorial will help guide someone new to MLIR through the steps of building increasingly complex multi-core designs. In order to understand this MLIR-based representation for AI Engine design, it is important to first understand overall AI Engine architecture.
This design tutorial will help guide someone new to MLIR through the steps of building increasingly complex multi-core designs. It can be consumned in different ways, depending on the user's end-goal. For a detailed walkthrough of the `mlir-aie` dialect at a physical-level, the tutorials should be followed in order and all the subdirectories of each tutorial should be read. For a more high-level description of the dialect the tutorials should still be followed in order, but reading only the first 5 tutorials will suffice to start building multi-core designs at a higher abstraction level without in-depth understanding of the low-level details. The subdirectories in each tutorial can thus be skipped (with the exception of tutorial-3 which introduces the high-level abstraction of the `mlir-aie` dialect).

The individual tutorials are listed below along with the AI Engine architecture topics they cover. Following this is a
a more detailed description of the architecture, ending with an overview of how each tutorial maps onto it.
Expand Down

0 comments on commit 8940958

Please sign in to comment.