merge main into amd-staging #742

ronlieb · 2025-12-03T12:01:14Z

No description provided.

Closes llvm#156161. Assisted-by: Claude Sonnet 4.5 via Claude Code

…lvm#162491) When including a header file, multiple files with the same name may exist across different search paths, like: &emsp;&emsp;|-- main.cpp &emsp;&emsp;|-- **header.h** &emsp;&emsp;|-- include &emsp;&emsp;|&emsp; └── **header.h** The compiler usually picks the first match it finds (typically following MSVC rules for current/include-chain paths first, then regular -I paths), which may not be the user’s intended header. This silent behavior can lead to subtle runtime API mismatches or increase the cost of resolving errors such as “error: use of undeclared identifier”, especially in large projects. Therefore, this patch tries to provide a diagnostic message without changing the current header selection. It does this by performing an additional search for duplicate filenames across all search paths (both MSVC rules and standard paths). This informs the user about a potential "header shadowing" issue and clarifies which header path was actually used. Since header searching is much cheaper than file loading, the added overhead should be within an acceptable range -- assuming the diagnostic message is valuable.

Fix wrong mask type that used by G_VSLIDEDOWN_VL.

Currently, DebugCounters work by creating a unique counter ID during registration, and then using that ID to look up the counter information in the global registry. However, this means that anything working with counters has to always go through the global instance. This includes the fast path that checks whether any counters are enabled. Instead, we can drop the counter IDs, and make the counter variables use CounterInfo themselves. We can then directly check whether the specific counter is active without going through the global registry. This is both faster for the fast-path where all counters are disabled, and also faster for the case where only one counter is active (as the fast-path can now still be used for all the disabled counters). After this change, disabled counters become essentially free at runtime, and we should be able to enable them in non-assert builds as well.

…tions (llvm#169269) This commit fixes two crashes in the `-remove-dead-values` pass related to private functions. Private functions are considered entirely "dead" by the liveness analysis, which drives the `-remove-dead-values` pass. The `-remove-dead-values` pass removes dead block arguments from private functions. Private functions are entirely dead, so all of their block arguments are removed. However, the pass did not correctly update all users of these dropped block arguments. 1. A side-effecting operation must be removed if one of its operands is dead. Otherwise, the operation would end up with a NULL operand. Note: The liveness analysis would not have marked an SSA value as "dead" if it had a reachable side-effecting users. (Therefore, it is safe to erase such side-effecting operations.) 2. A branch operation must be removed if one of its non-forwarded operands is dead. (E.g., the condition value of a `cf.cond_br`.) Whenever a terminator is removed, a `ub.unrechable` operation is inserted. This fixes llvm#158760.

This function is called from various .cpp files under `TargetBuiltins/`, and was moved unintentionally into `AMDGPU.cpp` in PR llvm#132252. Move it to a common place.

Fix the propagation added in commit 0d490ae to include all redecls, not only previous ones. This fixes another instance of the assertion "Cannot get layout of forward declarations" in getASTRecordLayout(). Kudos to Alexander Kornienko for providing an initial version of the reproducer that I further simplified. Fixes llvm#170084

…0396)

…t stubs (llvm#170426) These stubs (from 4bdf1aa) don’t actually override anything. Removing them eliminates the need for a local getMemIntrinsicCost() forwarder in llvm#169885.

…lvm#151944) Introduce MO_LaneMask as new machine operand type. This can be used to hold liveness infomation at sub-register granularity for register-type operands. We also introduce a new COPY_LANEMASK instruction that uses MO_lanemask operand to perform partial copy from source register opernad. One such use case of MO_LaneMask can be seen in llvm#151123, where it can be used to store live regUnits information corresponding to the source register of the COPY instructions, later can be used during CopyPhysReg expansion.

) This aligns with the DAP [specification](https://microsoft.github.io/debug-adapter-protocol//specification.html#Base_Protocol_ProtocolMessage) Force it to be an error in test cases.

…semblyInstEmulation (llvm#169631) This will reduce the diff in subsequent patches Part of a sequence of PRs: [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit llvm#169630 [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation llvm#169631 [lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632 [lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633 commit-id:5e758a22

…170170) Fixes llvm#154772 We previously set `ptx_kernel` for all kernels. But it's incorrect to add `ptx_kernel` to the stub version of kernel introduced in llvm#115821. This patch copies the workaround of AMDGPU.

…vm#170224) The parser now correctly handles: - abi_tags attached to operator<<: `operator<<[abi:SOMETAG]` - abi_tags with "operator" as the tag name: `func[abi:operator]`

Separate out float wave-reduce intrinsic tests from the overloaded call. Moved float add/sub/min/max ops from: `llvm.amdgcn.reduce.add/sub/min/max` to `llvm.amdgcn.reduce.fadd/fsub/fmin/fmax`.

This will allow the instruction emulation unwinder to reason about instructions that prevent the subsequent instruction from executing. Part of a sequence of PRs: [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit llvm#169630 [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation llvm#169631 [lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632 [lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633 commit-id:bb5df4aa

…#169832) Closes [llvm#169677](llvm#169677) --------- Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>

- Following llvm#168029. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I = nullptr); + InstructionCost getStridedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with same information as before.

This should match exactly the llvm attributes generated by classic flang.

…hFlat (llvm#170274) BUF instructions can access the scratch address space, so SIInsertWaitCnt needs to be able to track the SCRATCH_WRITE_ACCESS event for such BUF instructions. The release-vgprs.mir test had to be updated because BUF instructions w/o a MMO are now tracked as a SCRATCH_WRITE_ACCESS. I added a MMO that touches global to keep the test result unchanged. I also added a couple of testcases with no MMO to test the corrected behavior.

… lowering (llvm#169039) So far, memcpy with known size, memcpy with unknown size, memmove with known size, and memmove with unknown size have individual optimized loop lowering implementations, while memset and memset.pattern use an unoptimized loop lowering. This patch extracts the parts of the memcpy lowerings (for known and unknown sizes) that generate the control flow for the loop expansion into an `insertLoopExpansion` function. The `createMemCpyLoop(Unk|K)nownSize` functions then only collect the necessary arguments for `insertLoopExpansion`, call it, and fill the generated loop basic blocks. The immediate benefit of this is that logic from the two memcpy lowerings is deduplicated. Moreover, it enables follow-up patches that will use `insertLoopExpansion` to optimize the memset and memset.pattern implementations similarly to memcpy, since they can use the exact same control flow patterns. The test changes are due to more consistent and useful basic block names in the loop expansion and an improvement in basic block ordering: previously, the basic block that determines if the residual loop is executed would be put at the end of the function, now it is put before the residual loop body. Otherwise, the generated code should be equivalent. This patch doesn't affect memmove; deduplicating its logic would also be nice, but to extract all CF generation from the memmove lowering, `insertLoopExpansion` would need to be able to also create code that iterates backwards over the argument buffers. That would make `insertLoopExpansion` a lot more complex for a code path that's only used for memmove, so it's probably not worth refactoring. For SWDEV-543208.

EmitC currently models C's `&` and `*` operators via its `apply` op, which has several drawbacks: - Its pre-lvalue semantics combines dereferencing with memory access. - Representing multiple opcodes (selected by an attribute) in a single op complicates the code by adding a second, attribute-based selection layer on top of MLIR's standard `isa<>` mechanism. This patch adds two distinct, lvalue-based ops to model these C operators. EmitC passes were converted to use the new ops instead of `apply`, which is now deprecated.

…lvm#170034) Closes llvm#169166 --------- Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>

…#169633) This allows the unwinder to handle code with mid-function epilogues where the subsequent code is reachable through a backwards branch. Two changes are required to accomplish this: 1. Do not enqueue the subsequent instruction if the current instruction is a barrier(*). 2. When processing an instruction, stop ignoring branches with negative offsets. (*) As per the definition in LLVM's MC layer, a barrier is any instruction that "stops control flow from executing the instruction immediately following it". See `MCInstrDesc::isBarrier` in MCInstrDesc.h Part of a sequence of PRs: [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit llvm#169630 [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation llvm#169631 [lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632 [lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633 commit-id:fd266c13

This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

…ointSite (llvm#169799) Suppose two threads are performing the exact same step out plan. They will both have an internal breakpoint set at their parent frame. Now supposed both of those breakpoints are in the same address (i.e. the same BreakpointSite). At the end of `ThreadPlanStepOut::DoPlanExplainsStop`, we see this: ``` // If there was only one owner, then we're done. But if we also hit // some user breakpoint on our way out, we should mark ourselves as // done, but also not claim to explain the stop, since it is more // important to report the user breakpoint than the step out // completion. if (site_sp->GetNumberOfConstituents() == 1) return true; ``` In other words, the plan looks at the name number of constituents of the site to decide whether it explains the stop, the logic being that a _user_ might have put a breakpoint there. However, the implementation is not correct; in particular, it will fail in the situation described above. We should only care about non-internal breakpoints that would stop for the current thread. It is tricky to test this, as it depends on the timing of threads, but I was able to consistently reproduce the issue with a swift program using concurrency. rdar://165481473

The code for this commit was taken with minimal modification to fit LLVM style from llvm-project/orc-rt/include/CallableTraitsHelper.h and llvm-project/orc-rt/unittests/CallableTraitsHelperTest.cpp (originally commited in 40fce32) CallableTraitsHelper identifies the return type and argument types of a callable type and passes those to an implementation class template to operate on. E.g. the CallableArgInfoImpl class exposes these types as typedefs. Porting CallableTraitsHelper from the new ORC runtime will allow us to simplify existing and upcoming "callable-traits" classes in ORC.

…ugh a fma with multiple constants (llvm#170458) Despite 2 of the 3 arguments of the fma intrinsics calls being constant (free shuffle), foldShuffleOfIntrinsics fails to fold the shuffle through

…I=RegF=0, CR!=1 (llvm#170294) In these cases, there are no other GPRs or float registers that would have been backed up before the register homing area, that would have allocated space on the stack for the saved registers. Normally, the register homing part of the prologue consists of 4 nop unwind codes. However, if we haven't allocated stack space for those arguments yet, there's no space to store them in. The previous printout, printing "stp x0, x1, [sp, #-N]!" wouldn't work when interpreted as a nop unwind code. Based on "dumpbin -unwindinfo", and from empirical inspection with RtlVirtualUnwind, it turns out that the homing of argument registers is done outside of the prologue. In these cases, "dumpbin -unwindinfo" prints an annotation "(argument registers homed post-prolog)". Adjust the printout accordingly. In these cases, the later stack allocation (either "stp x29, x30, [sp, #-LocSZ]! or "sub sp, sp, #LocSZ") is adjusted to include the space the homed registers (i.e. be the full size from FrameSize).

…ebsite" This reverts commit bfde296.

This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

…m#170449) This is useful since we can highlight the opcode that OpPC points to.

…lvm#157306) Closes llvm#146482.

z1-cciauto · 2025-12-03T12:01:47Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3066

vbvictor and others added 30 commits December 3, 2025 08:56

[clang-tidy][NFC] Enable readability-any-all-of check (llvm#167134)

d05370e

Closes llvm#156161. Assisted-by: Claude Sonnet 4.5 via Claude Code

[clang-tidy][NFC] Fix miscellaneous clang-tidy warnings (llvm#170424)

73036cf

[RISCV][GISel] Fix legalize G_EXTRACT_SUBVECTOR (llvm#169877)

9f634c6

Fix wrong mask type that used by G_VSLIDEDOWN_VL.

[CIR] Use default attribute printer/parser (NFC) (llvm#170366)

30f479f

Move CodeGenFunction::EmitScalarOrConstFoldImmArg; NFC (llvm#170286)

98182f4

This function is called from various .cpp files under `TargetBuiltins/`, and was moved unintentionally into `AMDGPU.cpp` in PR llvm#132252. Move it to a common place.

[AMDGPU] Avoid undefs in hazard-gfx1250-flat-scr-hi.mir. NFC (llvm#17…

befa4e8

…0396)

[Hexagon][NFC] Drop no-op getMaskedMemoryOpCost/getGatherScatterOpCos…

ae4289f

…t stubs (llvm#170426) These stubs (from 4bdf1aa) don’t actually override anything. Removing them eliminates the need for a local getMemIntrinsicCost() forwarder in llvm#169885.

[lldb-dap] start all sent protocol message from number one. (llvm#170378

c5ecdec

) This aligns with the DAP [specification](https://microsoft.github.io/debug-adapter-protocol//specification.html#Base_Protocol_ProtocolMessage) Force it to be an error in test cases.

[lldb] Fix abi_tag parsing for operator<< and operator-named tags (ll…

8b7a07a

…vm#170224) The parser now correctly handles: - abi_tags attached to operator<<: `operator<<[abi:SOMETAG]` - abi_tags with "operator" as the tag name: `func[abi:operator]`

[NFC][AMDGPU] Refactor wave reduce test files (llvm#170440)

7cdb27a

Separate out float wave-reduce intrinsic tests from the overloaded call. Moved float add/sub/min/max ops from: `llvm.amdgcn.reduce.add/sub/min/max` to `llvm.amdgcn.reduce.fadd/fsub/fmin/fmax`.

[clang-tidy] Fix cppcoreguidelines-pro-type-member-init check (llvm…

9296223

…#169832) Closes [llvm#169677](llvm#169677) --------- Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>

[flang] implement VECTOR VECTORLENGTH directive (llvm#170114)

5ccf8c9

This should match exactly the llvm attributes generated by classic flang.

[clang-tidy] Fix false positive in readability-redundant-typename (l…

4977444

…lvm#170034) Closes llvm#169166 --------- Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>

[lldb/docs] Add ScriptingFrameProvider documentation to the website

bfde296

This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

[VectorCombine][X86] Add tests showing failure to push a shuffle thro…

6822e3c

…ugh a fma with multiple constants (llvm#170458) Despite 2 of the 3 arguments of the fma intrinsics calls being constant (free shuffle), foldShuffleOfIntrinsics fails to fold the shuffle through

mstorsjo and others added 7 commits December 3, 2025 13:09

Revert "[lldb/docs] Add ScriptingFrameProvider documentation to the w…

4286a47

…ebsite" This reverts commit bfde296.

[lldb/docs] Add ScriptingFrameProvider documentation to the website

0dcbc87

This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

[gn build] Port aeb36a9

dd9a516

[clang][bytecode] Accept current PC argument in Function::dump() (llv…

4497c53

…m#170449) This is useful since we can highlight the opcode that OpPC points to.

[clang-tidy] Remove 'clang-analyzer-*' checks from default checks. (l…

d68f543

…lvm#157306) Closes llvm#146482.

merge main into amd-staging

38c7495

ronlieb requested review from a team and dpalermo December 3, 2025 12:01

dpalermo approved these changes Dec 3, 2025

View reviewed changes

z1-cciauto merged commit 5b3c422 into amd-staging Dec 3, 2025
11 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251203053953 branch December 3, 2025 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #742

merge main into amd-staging #742

Uh oh!

ronlieb commented Dec 3, 2025

Uh oh!

z1-cciauto commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

30 participants

merge main into amd-staging #742

merge main into amd-staging #742

Uh oh!

Conversation

ronlieb commented Dec 3, 2025

Uh oh!

z1-cciauto commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

30 participants