forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 77
merge main into amd-staging #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
z1-cciauto
merged 37 commits into
amd-staging
from
amd/merge/upstream_merge_20251203053953
Dec 3, 2025
Merged
merge main into amd-staging #742
z1-cciauto
merged 37 commits into
amd-staging
from
amd/merge/upstream_merge_20251203053953
Dec 3, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Closes llvm#156161. Assisted-by: Claude Sonnet 4.5 via Claude Code
…lvm#162491) When including a header file, multiple files with the same name may exist across different search paths, like:   |-- main.cpp   |-- **header.h**   |-- include   |  └── **header.h** The compiler usually picks the first match it finds (typically following MSVC rules for current/include-chain paths first, then regular -I paths), which may not be the user’s intended header. This silent behavior can lead to subtle runtime API mismatches or increase the cost of resolving errors such as “error: use of undeclared identifier”, especially in large projects. Therefore, this patch tries to provide a diagnostic message without changing the current header selection. It does this by performing an additional search for duplicate filenames across all search paths (both MSVC rules and standard paths). This informs the user about a potential "header shadowing" issue and clarifies which header path was actually used. Since header searching is much cheaper than file loading, the added overhead should be within an acceptable range -- assuming the diagnostic message is valuable.
Fix wrong mask type that used by G_VSLIDEDOWN_VL.
Currently, DebugCounters work by creating a unique counter ID during registration, and then using that ID to look up the counter information in the global registry. However, this means that anything working with counters has to always go through the global instance. This includes the fast path that checks whether any counters are enabled. Instead, we can drop the counter IDs, and make the counter variables use CounterInfo themselves. We can then directly check whether the specific counter is active without going through the global registry. This is both faster for the fast-path where all counters are disabled, and also faster for the case where only one counter is active (as the fast-path can now still be used for all the disabled counters). After this change, disabled counters become essentially free at runtime, and we should be able to enable them in non-assert builds as well.
…tions (llvm#169269) This commit fixes two crashes in the `-remove-dead-values` pass related to private functions. Private functions are considered entirely "dead" by the liveness analysis, which drives the `-remove-dead-values` pass. The `-remove-dead-values` pass removes dead block arguments from private functions. Private functions are entirely dead, so all of their block arguments are removed. However, the pass did not correctly update all users of these dropped block arguments. 1. A side-effecting operation must be removed if one of its operands is dead. Otherwise, the operation would end up with a NULL operand. Note: The liveness analysis would not have marked an SSA value as "dead" if it had a reachable side-effecting users. (Therefore, it is safe to erase such side-effecting operations.) 2. A branch operation must be removed if one of its non-forwarded operands is dead. (E.g., the condition value of a `cf.cond_br`.) Whenever a terminator is removed, a `ub.unrechable` operation is inserted. This fixes llvm#158760.
This function is called from various .cpp files under `TargetBuiltins/`, and was moved unintentionally into `AMDGPU.cpp` in PR llvm#132252. Move it to a common place.
Fix the propagation added in commit 0d490ae to include all redecls, not only previous ones. This fixes another instance of the assertion "Cannot get layout of forward declarations" in getASTRecordLayout(). Kudos to Alexander Kornienko for providing an initial version of the reproducer that I further simplified. Fixes llvm#170084
…t stubs (llvm#170426) These stubs (from 4bdf1aa) don’t actually override anything. Removing them eliminates the need for a local getMemIntrinsicCost() forwarder in llvm#169885.
…lvm#151944) Introduce MO_LaneMask as new machine operand type. This can be used to hold liveness infomation at sub-register granularity for register-type operands. We also introduce a new COPY_LANEMASK instruction that uses MO_lanemask operand to perform partial copy from source register opernad. One such use case of MO_LaneMask can be seen in llvm#151123, where it can be used to store live regUnits information corresponding to the source register of the COPY instructions, later can be used during CopyPhysReg expansion.
) This aligns with the DAP [specification](https://microsoft.github.io/debug-adapter-protocol//specification.html#Base_Protocol_ProtocolMessage) Force it to be an error in test cases.
…semblyInstEmulation (llvm#169631) This will reduce the diff in subsequent patches Part of a sequence of PRs: [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit llvm#169630 [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation llvm#169631 [lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632 [lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633 commit-id:5e758a22
…170170) Fixes llvm#154772 We previously set `ptx_kernel` for all kernels. But it's incorrect to add `ptx_kernel` to the stub version of kernel introduced in llvm#115821. This patch copies the workaround of AMDGPU.
…vm#170224) The parser now correctly handles: - abi_tags attached to operator<<: `operator<<[abi:SOMETAG]` - abi_tags with "operator" as the tag name: `func[abi:operator]`
Separate out float wave-reduce intrinsic tests from the overloaded call. Moved float add/sub/min/max ops from: `llvm.amdgcn.reduce.add/sub/min/max` to `llvm.amdgcn.reduce.fadd/fsub/fmin/fmax`.
This will allow the instruction emulation unwinder to reason about instructions that prevent the subsequent instruction from executing. Part of a sequence of PRs: [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit llvm#169630 [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation llvm#169631 [lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632 [lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633 commit-id:bb5df4aa
…#169832) Closes [llvm#169677](llvm#169677) --------- Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
- Following llvm#168029. This is a step toward a unified interface for masked/gather-scatter/strided/expand-compress cost modeling. - Replace the ad-hoc parameter list with a single attributes object. API change: ``` - InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I = nullptr); + InstructionCost getStridedMemoryOpCost(MemIntrinsicCostAttributes, + CostKind); ``` Notes: - NFCI intended: callers populate MemIntrinsicCostAttributes with same information as before.
This should match exactly the llvm attributes generated by classic flang.
…hFlat (llvm#170274) BUF instructions can access the scratch address space, so SIInsertWaitCnt needs to be able to track the SCRATCH_WRITE_ACCESS event for such BUF instructions. The release-vgprs.mir test had to be updated because BUF instructions w/o a MMO are now tracked as a SCRATCH_WRITE_ACCESS. I added a MMO that touches global to keep the test result unchanged. I also added a couple of testcases with no MMO to test the corrected behavior.
… lowering (llvm#169039) So far, memcpy with known size, memcpy with unknown size, memmove with known size, and memmove with unknown size have individual optimized loop lowering implementations, while memset and memset.pattern use an unoptimized loop lowering. This patch extracts the parts of the memcpy lowerings (for known and unknown sizes) that generate the control flow for the loop expansion into an `insertLoopExpansion` function. The `createMemCpyLoop(Unk|K)nownSize` functions then only collect the necessary arguments for `insertLoopExpansion`, call it, and fill the generated loop basic blocks. The immediate benefit of this is that logic from the two memcpy lowerings is deduplicated. Moreover, it enables follow-up patches that will use `insertLoopExpansion` to optimize the memset and memset.pattern implementations similarly to memcpy, since they can use the exact same control flow patterns. The test changes are due to more consistent and useful basic block names in the loop expansion and an improvement in basic block ordering: previously, the basic block that determines if the residual loop is executed would be put at the end of the function, now it is put before the residual loop body. Otherwise, the generated code should be equivalent. This patch doesn't affect memmove; deduplicating its logic would also be nice, but to extract all CF generation from the memmove lowering, `insertLoopExpansion` would need to be able to also create code that iterates backwards over the argument buffers. That would make `insertLoopExpansion` a lot more complex for a code path that's only used for memmove, so it's probably not worth refactoring. For SWDEV-543208.
EmitC currently models C's `&` and `*` operators via its `apply` op, which has several drawbacks: - Its pre-lvalue semantics combines dereferencing with memory access. - Representing multiple opcodes (selected by an attribute) in a single op complicates the code by adding a second, attribute-based selection layer on top of MLIR's standard `isa<>` mechanism. This patch adds two distinct, lvalue-based ops to model these C operators. EmitC passes were converted to use the new ops instead of `apply`, which is now deprecated.
…lvm#170034) Closes llvm#169166 --------- Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
…#169633) This allows the unwinder to handle code with mid-function epilogues where the subsequent code is reachable through a backwards branch. Two changes are required to accomplish this: 1. Do not enqueue the subsequent instruction if the current instruction is a barrier(*). 2. When processing an instruction, stop ignoring branches with negative offsets. (*) As per the definition in LLVM's MC layer, a barrier is any instruction that "stops control flow from executing the instruction immediately following it". See `MCInstrDesc::isBarrier` in MCInstrDesc.h Part of a sequence of PRs: [lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit llvm#169630 [lldb][NFC] Rename forward_branch_offset to branch_offset in UnwindAssemblyInstEmulation llvm#169631 [lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632 [lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633 commit-id:fd266c13
This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…ointSite (llvm#169799) Suppose two threads are performing the exact same step out plan. They will both have an internal breakpoint set at their parent frame. Now supposed both of those breakpoints are in the same address (i.e. the same BreakpointSite). At the end of `ThreadPlanStepOut::DoPlanExplainsStop`, we see this: ``` // If there was only one owner, then we're done. But if we also hit // some user breakpoint on our way out, we should mark ourselves as // done, but also not claim to explain the stop, since it is more // important to report the user breakpoint than the step out // completion. if (site_sp->GetNumberOfConstituents() == 1) return true; ``` In other words, the plan looks at the name number of constituents of the site to decide whether it explains the stop, the logic being that a _user_ might have put a breakpoint there. However, the implementation is not correct; in particular, it will fail in the situation described above. We should only care about non-internal breakpoints that would stop for the current thread. It is tricky to test this, as it depends on the timing of threads, but I was able to consistently reproduce the issue with a swift program using concurrency. rdar://165481473
The code for this commit was taken with minimal modification to fit LLVM style from llvm-project/orc-rt/include/CallableTraitsHelper.h and llvm-project/orc-rt/unittests/CallableTraitsHelperTest.cpp (originally commited in 40fce32) CallableTraitsHelper identifies the return type and argument types of a callable type and passes those to an implementation class template to operate on. E.g. the CallableArgInfoImpl class exposes these types as typedefs. Porting CallableTraitsHelper from the new ORC runtime will allow us to simplify existing and upcoming "callable-traits" classes in ORC.
…ugh a fma with multiple constants (llvm#170458) Despite 2 of the 3 arguments of the fma intrinsics calls being constant (free shuffle), foldShuffleOfIntrinsics fails to fold the shuffle through
…I=RegF=0, CR!=1 (llvm#170294) In these cases, there are no other GPRs or float registers that would have been backed up before the register homing area, that would have allocated space on the stack for the saved registers. Normally, the register homing part of the prologue consists of 4 nop unwind codes. However, if we haven't allocated stack space for those arguments yet, there's no space to store them in. The previous printout, printing "stp x0, x1, [sp, #-N]!" wouldn't work when interpreted as a nop unwind code. Based on "dumpbin -unwindinfo", and from empirical inspection with RtlVirtualUnwind, it turns out that the homing of argument registers is done outside of the prologue. In these cases, "dumpbin -unwindinfo" prints an annotation "(argument registers homed post-prolog)". Adjust the printout accordingly. In these cases, the later stack allocation (either "stp x29, x30, [sp, #-LocSZ]! or "sub sp, sp, #LocSZ") is adjusted to include the space the homed registers (i.e. be the full size from FrameSize).
…ebsite" This reverts commit bfde296.
This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…m#170449) This is useful since we can highlight the opcode that OpPC points to.
Collaborator
dpalermo
approved these changes
Dec 3, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.