Skip to content

Commit afdfbe3

Browse files
committed
[CIR][Doc] Improve documentation with design and context
1 parent 881aefb commit afdfbe3

File tree

2 files changed

+271
-7
lines changed

2 files changed

+271
-7
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
This is a work-in-progress version adding the [`aie++` C++ programming
1+
This is a huge work-in-progress version adding the [`aie++` C++ programming
22
model](docs/CIR.md) to MLIR AIE based on [ClangIR for MLIR AIE
33
fork](https://github.com/keryell/clangir/tree/mlir-aie-version).
44

docs/CIR.md

+270-6
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,19 @@ AIE/IRON as a prototyping phase with the goal of generalizing it as a
1515
configurable framework to handle for example MLIR AIR and spensors (to be
1616
sensored on-line).
1717

18-
Since most of ACDC software stack is MLIR-based, we are using [ClangIR](https://github.com/llvm/clangir) to generate some MLIR from C/C++. ClangIR is a
19-
fork of Clang generating MLIR CIR dialect during its code generation and sounds
20-
like the most promising approach since it is in the process of being up-streamed
21-
to Clang/LLVM/MLIR and it is backed by several companies.
18+
Since most of ACDC software stack is MLIR-based, we are using
19+
[ClangIR](https://github.com/llvm/clangir) to generate some MLIR from
20+
C/C++. ClangIR is a fork of Clang generating MLIR CIR dialect during its code
21+
generation and sounds like the most promising approach since it is in the
22+
process of being up-streamed to Clang/LLVM/MLIR and it is backed by several
23+
companies.
24+
25+
This project requires the compiler flow in ClangIR going from MLIR CIR to the
26+
MLIR standard dialects which is unfortunately quite less implemented than the
27+
direct ClangIR CIR → LLVMIR lowering. This explains that most of the efforts in
28+
the `aie++` project have been diverted into the implementation of the CIR→MLIR
29+
flow itself. There are also some fundamental limitations in the MLIR standard
30+
dialects themselves which have to be overcome.
2231

2332
The current AIE++ prototype has 2 components:
2433

@@ -43,13 +52,54 @@ There is no end-to-end working flow or integration with
4352
There is a script running the compilation flow for the device part of the
4453
program and leaving a local file for each intermediate phase for inspection.
4554

46-
A typical use case is:
55+
A simple program like [`example.cpp`](../test/CIR/aie++/example.cpp):
56+
57+
```c++
58+
#include "aie++.hpp"
59+
60+
int main() {
61+
aie::device<aie::npu1> d;
62+
auto t = d.tile<1, 4>();
63+
auto b = t.buffer<int, 8192>();
64+
t.program([&] { b[3] = 14; });
65+
d.tile<2, 3>().program([] {});
66+
d.run();
67+
}
68+
```
69+
can be compiled with [`aie++-compile.sh`](../utils/aie++-compile.sh):
4770

4871
```bash
4972
PATH=$LLVM_DIR/build/bin:$MLIR_AIE_HOME/build/bin:$PATH $MLIR_AIE_HOME/utils/aie++-compile.sh example.cpp
5073
```
74+
to produce among others an `example.aie.only.cir` file:
5175

52-
which might generate locally the files
76+
```MLIR
77+
aie.device(npu1) {
78+
// [...lot of skipped functions...]
79+
%tile_1_4 = aie.tile(1, 4) {cir.type = !cir.ptr<!ty_aie3A3Atile3C12C_42C_aie3A3Adevice3C3E3E>}
80+
%buffer_1_4 = aie.buffer(%tile_1_4) {cir.type = !cir.ptr<!ty_aie3A3Abuffer3Cint2C_8192UL3E>} : memref<8192xi32>
81+
%1 = builtin.unrealized_conversion_cast %buffer_1_4 : memref<8192xi32> to !cir.ptr<!ty_aie3A3Abuffer3Cint2C_8192UL3E>
82+
%core_1_4 = aie.core(%tile_1_4) {
83+
cir.scope {
84+
%3 = cir.const #cir.int<14> : !s32i
85+
%4 = cir.const #cir.int<3> : !s32i
86+
%5 = cir.cast(integral, %4 : !s32i), !u64i
87+
%6 = cir.call @_ZN3aie6bufferIiLm8192EEixEm(%1, %5) : (!cir.ptr<!ty_aie3A3Abuffer3Cint2C_8192UL3E>, !u64i) -> !cir.ptr<!s32i>
88+
cir.store %3, %6 : !s32i, !cir.ptr<!s32i>
89+
}
90+
aie.end
91+
}
92+
%tile_2_3 = aie.tile(2, 3) {cir.type = !cir.ptr<!ty_aie3A3Atile3C22C_32C_aie3A3Adevice3C3E3E>}
93+
%2 = builtin.unrealized_conversion_cast %tile_2_3 : index to !cir.ptr<!ty_aie3A3Atile3C22C_32C_aie3A3Adevice3C3E3E>
94+
%core_2_3 = aie.core(%tile_2_3) {
95+
cir.scope {
96+
}
97+
aie.end
98+
}
99+
} {cir.type = !cir.ptr<!ty_aie3A3Adevice3Caie3A3Anpu12C_aie3A3A28lambda_at_2E2Faie2B2B2Ehpp3A2183A63293E>}
100+
```
101+
102+
and generate other local files like:
53103

54104
```
55105
example.cir
@@ -162,8 +212,222 @@ extensions for [AIE](https://github.com/triSYCL/sycl) to represent everything in
162212
pure modern C++ with classes and lambdas in a type-safe way, having together in
163213
the same program the host code and the device code.
164214

215+
## C++ to MLIR strategy
216+
217+
Different solutions have been studied
218+
219+
### Plan A: Polygeist + MLIR-AIR/MLIR-AIE
220+
221+
[Polygeist](https://github.com/llvm/Polygeist)
222+
223+
- C++ front-end from scratch started as PhD work @MIT
224+
225+
- Used by ACDC in the past
226+
227+
- Not being able to parse any real program (`std::`)
228+
229+
- Was lagging by being based on quite old LLVM version
230+
231+
- Not possible to use current MLIR-AIR/MLIR-AIE
232+
233+
- Difficult to modernize the code-base, even if other people are trying the same
234+
from time to time…
235+
236+
237+
### Plan B: SYCL MLIR + MLIR-AIR/MLIR-AIE
238+
239+
[SYCL MLIR](https://github.com/intel/llvm/tree/sycl-mlir)
240+
241+
- Intel has developed SYCL MLIR branch with single-source C++ parser (host +
242+
device)
243+
244+
- Polygeist fork by Intel
245+
246+
- Extended and rebased on latest version of LLVM
247+
248+
- Idea for AIE: leverage Intel engineering
249+
250+
- MLIR-AIR/MLIR-AIE + C++ part of SYCL
251+
252+
Problems:
253+
254+
- Only device code goes through Polygeist because not robust enough to handle
255+
host code
256+
257+
- Host C++ code goes though usual Clang + LLVM → MLIR LLVM → raised to some MLIR
258+
SYCL & other dialects
259+
260+
- Not generic enough for plain C++ but could be adapted to AIE++
261+
262+
- Intel started deprecating this project and moving to ClangIR at the end of
263+
2023
264+
265+
- Branch is now stale
266+
267+
268+
### Plan C: VAST (Trail of bits)
269+
270+
[VAST](https://github.com/trailofbits/vast)
271+
272+
- Security company working on program analysis and instrumentation
273+
274+
- No MLIR std but 2 dialects, HL and LL 
275+
276+
`-vast-emit-mlir=hl` to generate high-level dialect
277+
278+
`-vast-emit-mlir=llvm` to generate LLVM MLIR dialect
279+
280+
281+
### Plan D: Use ClangIR project
282+
283+
[ClangIR](https://llvm.github.io/clangir)
284+
285+
- Clang-based C/C++ parser generating MLIR CIR dialect pushed by Meta & Nvidia
286+
287+
- Pragmatic approach: `ASTConsumer` of Clang
288+
289+
- Reuse Clang C++ semantics analysis
290+
291+
- Duplicate proven skeleton CodeGen → LLVM IR with CodeGen → MLIR CIR dialect
292+
293+
- Clever: keep most of the logic as is, because Clang is quite complicated
294+
295+
- Can lower directly to LLVM IR or to standard MLIR dialects (affine, scf,
296+
cf…)
297+
298+
- Good traction in the industry: Meta (analyses), Nvidia (OpenACC, OpenMP,
299+
Flang), Intel (SYCL), Microsoft (HLSL), Google (Polygeist), Trail of Bits
300+
(VAST), NextSilicon, AMD (AIE++ 😏)…
301+
302+
- In the process of being up-streamed https://discourse.llvm.org/t/rfc-upstreaming-clangir/76587 😃
303+
304+
- Problems
305+
306+
- Complex rebase-only development process on top of up-stream
307+
308+
- Painful to stay close to up-stream → lagging behind
309+
310+
- Not always up-to-date with upstream compared to MLIR-AIR/MLIR-AIE 
311+
312+
- But able to merge MLIR-AIR/MLIR-AIE LLVM version into an AMD ClangIR
313+
branch relatively easily
314+
315+
- WIP with Meta priority on LLVM direct compilation for Android: lowering to
316+
MLIR std quite in infancy 
317+
318+
- Solutions
319+
320+
- AMD becomes a ClangIR contributor on CodeGen → MLIR CIR → MLIR std
321+
322+
- Prioritize work on AIE++ to minimize my ClangIR contribution while giving a
323+
taste of AIE++
324+
325+
326+
### Implementation of structs
327+
328+
#### The MLIR `tuple` tragedy
329+
330+
- Builtin MLIR type to represent product type
331+
332+
- class, struct, C++`std::tuple`, Python `collections.namedtuple`
333+
334+
- Orphan type in core MLIR! 😦
335+
336+
- No operation
337+
338+
- No attribute
339+
340+
- Cannot be in a `memref`
341+
342+
→ No reuse design pattern
343+
344+
→ Pile of anti-patterns
345+
346+
- Reimplement over again and again similar operations (insert element, extract
347+
element…)
348+
349+
- Reimplement again and again similar type in front-end (`!cir.struct`) or
350+
back-end (`!llvm.struct`, `!spirv.struct`)
351+
352+
- Replicate the datalayout anti-pattern
353+
354+
- Core feature for C++ and not implemented in ClangIR → MLIR std 😦
355+
356+
357+
#### The MLIR `tuple` strategy
358+
359+
- Experimented various strategies to lower C/C++ struct
360+
361+
- Extension of tuple type itself
362+
363+
- Very intrusive on other uses
364+
365+
- Looked at Polygeist hack relying on `memref<!llvm.struct<…>>`
366+
367+
- `polygeist.memref2pointer`
368+
369+
- Compute the access with `llvm.getelementptr`
370+
371+
- `polygeist.pointer2memref`
372+
373+
- Manual address computation + `named_tuple.cast`
374+
375+
- Create new minimal `named_tuple`
376+
377+
For example:
378+
379+
```c++
380+
struct s {
381+
int a;
382+
double b;
383+
char c;
384+
float d[5];
385+
};
386+
int main() {
387+
s v;
388+
v.c = 'z’;
389+
}
390+
```
391+
392+
is lowered to:
393+
394+
```MLIR
395+
%c122_i8 = arith.constant 122 : i8
396+
%2 = named_tuple.cast %alloca_0 : memref<!named_tuple.named_tuple<"s", [i32, f64, i8, tensor<5xf32>]>> to memref<40xi8>
397+
%c16 = arith.constant 16 : index
398+
%view_3 = memref.view %2[%c16][] : memref<40xi8> to memref<i8>
399+
memref.store %c122_i8, %view_3[] : memref<i8>
400+
```
165401

166402
## TODO list
167403

404+
- Documentation
405+
406+
- Minimal C++ → MLIR AIE e2e example
407+
408+
- Develop MLIR AIE C++ abstraction header & runtime
409+
168410
- Make the C++ code also compilable with normal C++ compiler for pure-host
169411
execution with AIE emulation for debugging and ease of development purpose.
412+
413+
- Minimal support of struct is required
414+
415+
- More tests & tutorial from AIR & AIE as examples
416+
417+
- Adapt examples or C++ applications to show how to use the framework incrementally
418+
419+
- Generalize C++ header for different DSL
420+
421+
- Encode in C++ header some transformation recipes to apply on each DSL class
422+
423+
- Create an MLIR library to handle generic lowering of CIR
424+
425+
- Merge PR upstream & integration to aiecc.py
426+
427+
- Development of ClangIR → MLIR standard dialect
428+
429+
- Develop CIR MLIR transformations to lower to target + standard dialect
430+
431+
- Help on ClangIR up-streaming
432+
433+
- Push for high-level-language-support in MLIR standard dialects

0 commit comments

Comments
 (0)