add std::map fix #4

bruteforceboy · 2025-03-17T13:32:17Z

No description provided.

…#1081) Now implement the same as [OG](https://github.com/llvm/clangir/blob/7619b20d7461b2d46c17a3154ec4b2f12ca35ea5/clang/lib/CodeGen/CGBuiltin.cpp#L7886), which is to call llvm aarch64 intrinsic which would eventually become [an ARM64 instruction](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/ABS--Absolute-value--vector--?lang=en). However, clearly there is an alternative, which is to extend CIR::AbsOp and CIR::FAbsOp to support vector type and only lower it at LLVM Lowering stage to either [LLVM::FAbsOP ](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrfabs-llvmfabsop) or [[LLVM::AbsOP ]](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrabs-llvmabsop), provided LLVM dialect could do the right thing of TargetLowering by translating to llvm aarch64 intrinsic eventually. The question is whether it is worth doing it? Any way, put up this diff for suggestions and ideas.

…f -fstrict-vtable-pointers (llvm#1138) Without using flag `-fstrict-vtable-pointers`, `__builtin_launder` is a noop. This PR implements that, and leave implementation for the case of `-fstrict-vtable-pointers` to future where there is a need. This PR also adapted most of test cases from [OG test case](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/test/CodeGenCXX/builtin-launder.cpp#L1). I didn't use test cases in the namespace [pessimizing_cases](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/test/CodeGenCXX/builtin-launder.cpp#L269), as they have no difference even when `-fstrict-vtable-pointers` is on.

…1165) If a record type contains an array of non-record types, we can generate a copy for it inside the copy constructor, as CodeGen does. CodeGen does so for arrays of record types where applicable as well, but we'll want to represent the construction of those explicitly, as outlined in llvm#1055.

… semantics of C++ initializer list (llvm#1121) I don't finish all work about `cir.initlist`. But I want to get some feedback about the cir design to make sure I am in correct way. Fixed: llvm#777

This PR also changed implementation of BI__builtin_neon_vshlq_v into using CIR ShiftOp

After I rebased, I found these problems with Spec2017. I was surprised why it doesn't have problems. Maybe some updates in LLVM part.

The requirement for the size of then-else part of cir.ternary operation seems to be too conservative. Like the example shows, it is possible the regions got expanded during the transformation.

…m#1169) For example, the following reaches ["NYI"](https://github.com/llvm/clangir/blob/c8b626d49e7f306052b2e6d3ce60b1f689d37cb5/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp#L348) when lowering to AArch64: ``` typedef struct { union { struct { char a, b; }; char c; }; } A; void foo(A a) {} void bar() { A a; foo(a); } ``` Currently, the value of the struct becomes a bitcast operation, so we can simply extend `findAlloca` to be able to trace the source alloca properly, then use that for the [coercion](https://github.com/llvm/clangir/blob/c8b626d49e7f306052b2e6d3ce60b1f689d37cb5/clang/lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp#L341) through memory. I have also added a test for this case.

Added a few FIXMEs. There are 2 types of FIXMEs; 1. Most of them are missing func call and parameter attributes. I didn't add for all missing sites for this type as it would have been just copy pastes. 2. FIXME in lambda __invoke(): OG simply returns but CIR generates call to llvm.trap. This is just temporary and we will fix in in near future. But I feel I should still list those IRs so once we fix problem with codegen of invoke, we'd get test failure on this one and fix it. Actually, this way, this test file would be a natural test case for implementation of invoke.

There are scenarios where we are not emitting cleanups, this commit starts to pave the way to be more complete in that area. Small addition of skeleton here plus some fixes. Both `clang/test/CIR/CodeGen/vla.c` and `clang/test/CIR/CodeGen/nrvo.cpp` now pass in face of this code path.

llvm#1145) Fix llvm#793

…lvm#1166) Close llvm#1131 This is another solution to llvm#1160 This patch revert llvm#1007 and remain its test. The problem described in llvm#1007 is workaround by skipping the check of equivalent of element types in arrays. We can't mock such checks simply by adding another attribute to `ConstStructAttr` since the types are aggregated. e.g., we have to handle the cases like `struct { union { ... } }` and `struct { struct { union { ... } } }` and so on. To make it, we have to introduce what I called "two type systems" in llvm#1160. This is not very good giving it removes a reasonable check. But it might not be so problematic since the Sema part has already checked it. (Of course, we still need face the risks to introduce new bugs any way)

…m#1173)

…of floating type (llvm#1174) [PR1132](llvm#1132) implements missing feature `fpUnaryOPsSupportVectorType`, so revisit this code. One another thing changed is that I stopped using `cir::isAnyFloatingPointType` as it contains types like long double and FP80 which are not supported by the [builtin's signature](https://clang.llvm.org/docs/LanguageExtensions.html#vector-builtins)

[OG's implementation ](https://github.com/llvm/clangir/blob/aaf38b30d31251f3411790820c5e1bf914393ddc/clang/lib/CodeGen/CGBuiltin.cpp#L7527) provides one common code to handle all neon SISD intrinsics. But IMHO, it entangles different things together which hurts readability. Here, We start with simple easy-to-understand approach with specific case. And in the future, as we handle more intrinsics, we may come up with a few simple common patterns.

[OG's implementation here](https://github.com/llvm/clangir/blob/1b052dac90f8d070aafc2034e13ae3e88552d58a/clang/lib/CodeGen/CGBuiltin.cpp#L13432) [OG's test here](https://github.com/llvm/clangir/blob/1b052dac90f8d070aafc2034e13ae3e88552d58a/clang/test/CodeGen/AArch64/neon-across.c#L41)

Co-authored-by: Sirui Mu <msrlancern@gmail.com>

This PR adds `clang::CodeGenOptions` to the lowering context. Similar to `clang::LangOptions`, the code generation options are currently set to the default values when initializing the lowering context. Besides, this PR also adds a new attribute `#cir.opt_level`. The attribute is a module-level attribute and it holds the optimization level (e.g. -O1, -Oz, etc.). The attribute is consumed when initializing the lowering context to populate the `OptimizationLevel` and the `OptimizeSize` field in the code generation options. CIRGen is updated to attach this attribute to the module op.

Removes some NYIs. But left assert(false) due to missing tests. It looks better since it is not so scaring as NYI.

This PR adds support for base-to-derived and derived-to-base casts on pointer-to-data-member values. Related to llvm#973.

llvm#1194) Basically, for int type, the order of Ops is not the same as OG in the emitted LLVM IR. OG has constant as the second op position. See [OG's order ](https://godbolt.org/z/584jrWeYn).

Default assignment operator generation was failing because of memcpy generation for fields being unsupported. Implement it following CodeGen's example, as usual. Follow-ups will avoid emitting memcpys for fields of trivial class types, and extend this to copy constructors as well. Fixes llvm#1128

) Our previous logic here was matching CodeGen, which folds trivial assignment operator calls into memcpys, but we want to avoid that. Note that we still end up emitting memcpys for arrays of classes with trivial assignment operators; llvm#1177 tracks fixing that.

CodeGen does so for trivial record types as well as non-record types; we only do it for non-record types.

This is a leftover from when ClangIR was initially focused on analysis and could ignore default method generation. We now handle default methods and should generate them in all cases. This fixes several bugs: - Default methods weren't emitted when emitting LLVM, only CIR. - Default methods only referenced by other default methods weren't emitted.

) This PR updates the LLVM lowering part of load and stores to const allocas and makes use of the !invariant.group metadata in the result LLVM IR. The HoistAlloca pass is also updated. The const flag on a hoisted alloca is removed for now since their uses are not always invariants. Will update in later PRs to teach their invariants.

This PR adds CIRGen support for C++23 `[[assume(expr)]]` statement.

…#1206)

Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.

Lower neon vcages_f32

This implements the missing feature `cir::setTargetAttributes`. Although other targets might also need attributes, this PR focuses on the CUDA-specific ones. For CUDA kernels (on device side, not stubs), they must have a calling convention of `ptx_kernel`. It is added here. CUDA kernels, as well as global variables, also involves lots of NVVM metadata, which is intended to be dealt with at the same place. It's marked with a new missing feature here.

Lower neon vmaxv_f32

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

This is part 2 of CUDA lowering. Still more to come! This PR generates `__cuda_register_globals` for functions only, without touching variables. It also fixes two discrepancies mentioned in Part 1, namely: - Now CIR will not generate registration code if there's nothing to register; - `__cuda_fatbin_wrapper` now becomes a constant.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

Lower neon vaddlv_s32

This is Part 3 of registration function generation. This generates `__cuda_module_dtor`. It cannot be placed in global dtors list, as treating it as a normal destructor will result in double-free in recent CUDA versions (see comments in OG). Rather, the function is passed as callback of `atexit`, which is called at the end of `__cuda_module_ctor`.

@Lancern

Traditional clang implementation: https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632 The problem here is that `__builtin_clz` allows undefined result, while `__lzcnt` doesn't. As a result, I have to create a new CIR for `__lzcnt`. Since the return type of those two builtin differs, I decided to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp` to inherit from it. I would like to hear your suggestions. C.c. @Lancern

This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <anominosgamer@gmail.com> Co-authored-by: anominos <46242743+anominos@users.noreply.github.com>

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404

The choice of adding a separate file imitates that of OG.

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

There are some subtleties here. This is the code in OG: ```cpp // note: this is different from default ABI if (!RetTy->isScalarType()) return ABIArgInfo::getDirect(); ``` which says we should return structs directly. It's correct, has have the same behaviour as `nvcc`, and it obeys the PTX ABI as well. The comment dates back to 2013 (see [this commit](llvm/llvm-project@f9329ff) -- it didn't provide any explanation either), so I believe it's outdated. I didn't include this comment in the PR.

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

…1487) This PR resolves an assertion failure in `CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying to emit a vtable entry to a virtual function whose type includes a pointer-to-member-function.

Lower neon vabsd_s64

…lvm#1431) Implements `::verify` for operations cir.atomic.xchg and cir.atomic.cmp_xchg I believe the existing regression tests don't get to the CIR level type check failure and I was not able to implement a case that does. Most attempts of reproducing cir.atomic.xchg type check failure were along the lines of: ``` int a; long long b,c; __atomic_exchange(&a, &b, &c, memory_order_seq_cst); ``` And they seem to never trigger the failure on `::verify` because they fail earlier in function parameter checking: ``` exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *' 7 | __atomic_exchange(&a, &b, &c, memory_order_seq_cst); | ^~ ``` Closes llvm#1378 .

Lower neon vcaled_f64

This PR adds a new boolean flag to the `cir.load` and the `cir.store` operation that distinguishes nontemporal loads and stores. Besides, this PR also adds support for the `__builtin_nontemporal_load` and the `__builtin_nontemporal_store` intrinsic function.

ghehg and others added 30 commits March 17, 2025 13:53

[CIR][Dialect] Introduce StdInitializerListOp to represent high-level…

ee4f0e1

… semantics of C++ initializer list (llvm#1121) I don't finish all work about `cir.initlist`. But I want to get some feedback about the cir design to make sure I am in correct way. Fixed: llvm#777

[CIR][CIRGen][Builtin][Neon] Lower __builtin_neon_vshl_v (llvm#1134)

89b0cd9

This PR also changed implementation of BI__builtin_neon_vshlq_v into using CIR ShiftOp

[CIR] Correct signedness for createSignedInt (llvm#1167)

286d691

After I rebased, I found these problems with Spec2017. I was surprised why it doesn't have problems. Maybe some updates in LLVM part.

[CIR] Relax the requirement for ternary operation (llvm#1168)

b62258e

The requirement for the size of then-else part of cir.ternary operation seems to be too conservative. Like the example shows, it is possible the regions got expanded during the transformation.

[CIR][NFC] Cleanup warnings post rebase

49e6856

[CIR][Lowering][debuginfo] Disable debug info if -g is not specified (

6c99597

llvm#1145) Fix llvm#793

[CIR][Dialect] Extend BinaryFPToFPBuiltinOp to vector of FP type (llv…

53d7913

…m#1173)

[CIR][CIRGen] Change SignBitOp result type to !cir.bool (llvm#1187)

3fdceb9

Co-authored-by: Sirui Mu <msrlancern@gmail.com>

[CIR] [CodeGen] Handle arrangeCXXStructorDeclaration (llvm#1179)

ebb7ab8

Removes some NYIs. But left assert(false) due to missing tests. It looks better since it is not so scaring as NYI.

[CIR] Add support for casting pointer-to-data-member values (llvm#1188)

b11f2b0

This PR adds support for base-to-derived and derived-to-base casts on pointer-to-data-member values. Related to llvm#973.

[CIR][CodeGen][LowerToLLVM] Fix llvm lowering of CIR UnaryOpKind_Not (

90815ba

llvm#1194) Basically, for int type, the order of Ops is not the same as OG in the emitted LLVM IR. OG has constant as the second op position. See [OG's order ](https://godbolt.org/z/584jrWeYn).

[CIR][CIRGen] Emit memcpys for copy constructors (llvm#1197)

8ce1ead

CodeGen does so for trivial record types as well as non-record types; we only do it for non-record types.

[CIR][CIRGen] Add CIRGen support for assume statement (llvm#1205)

55ce063

This PR adds CIRGen support for C++23 `[[assume(expr)]]` statement.

[CIR][CIRGen][Builtin][Neon] Lower neon_vbsl_v and neon_vbslq_v (llvm…

a70a37f

…#1206)

[CIR][CIRGen][Builtin][Neon] Lower neon_vaddlvq_u32 (llvm#1208)

9de5973

anominos and others added 25 commits March 17, 2025 19:20

[CIR][CUDA] Fix address space values for NVPTX (llvm#1445)

5e50b7e

Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.

[CIR][CIRGen][Builtin][Neon] Lower neon vcages_f32 (llvm#1449)

9aba928

Lower neon vcages_f32

[CIR][CIRGen][Builtin][Neon] Lower neon vmaxv_f32 (llvm#1460)

624b139

Lower neon vmaxv_f32

[CIR][CUDA] implement cuda constant variables (llvm#1444)

3b62a12

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

[CIR][CIRGen][Builtin][Neon] Lower neon vaddlv_s32 (llvm#1464)

5371e59

Lower neon vaddlv_s32

[CIR][CUDA] Support for inbuilt texture types (llvm#1469)

0dc6ca6

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

[CIR] Lower signext and zeroext attributes (llvm#1473)

b233fd1

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

[CIR][CIRGen][TBAA] Add support for vtable pointer (llvm#1463)

af77107

[CIR][CIRGen][builtin][X86] handle _mm_lfence (llvm#1474)

28f9896

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404

[CIR][CUDA] Support device-side printf (llvm#1475)

07e9d6e

The choice of adding a separate file imitates that of OG.

[CIR][CIRGen][builtin] handle __popcnt (llvm#1479)

6d6b23a

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

[CIR][NFC] Fix test failures caused by double spaces in check line (l…

c6fca3d

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

[CIR][CIRGen][Builtin][Neon] Lower neon vabsd_s64 (llvm#1489)

f2eb0f7

Lower neon vabsd_s64

[CIR][CIRGen][Builtin][Neon] Lower neon vcaled_f64 (llvm#1495)

4cf5576

Lower neon vcaled_f64

[CIR][CIRGen][builtin] handle _mm_pause (llvm#1493)

7679ba9

bruteforceboy force-pushed the map-fix branch from f549520 to 9919265 Compare March 18, 2025 10:01

bruteforceboy force-pushed the main branch from 7e4213f to 79d0d74 Compare March 18, 2025 10:03

[CIR][CodeGen] Add InsertionGuard for tryBodyScope

9790471

bruteforceboy force-pushed the map-fix branch from 9919265 to 9790471 Compare March 18, 2025 11:08

bruteforceboy force-pushed the main branch from 424918e to 98e8811 Compare April 10, 2025 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add std::map fix #4

add std::map fix #4

Uh oh!

bruteforceboy commented Mar 17, 2025

Uh oh!

Uh oh!

add std::map fix #4

Are you sure you want to change the base?

add std::map fix #4

Uh oh!

Conversation

bruteforceboy commented Mar 17, 2025

Uh oh!

Uh oh!