Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Dec 3, 2025

No description provided.

vbvictor and others added 30 commits December 3, 2025 08:56
Closes llvm#156161.
Assisted-by: Claude Sonnet 4.5 via Claude Code
…lvm#162491)

When including a header file, multiple files with the same name may
exist across different search paths, like:
   |-- main.cpp
  |-- **header.h**
  |-- include
  |  └── **header.h**
The compiler usually picks the first match it finds (typically following
MSVC rules for current/include-chain paths first, then regular -I
paths), which may not be the user’s intended header.
This silent behavior can lead to subtle runtime API mismatches or
increase the cost of resolving errors such as “error: use of undeclared
identifier”, especially in large projects.

Therefore, this patch tries to provide a diagnostic message without
changing the current header selection. It does this by performing an
additional search for duplicate filenames across all search paths (both
MSVC rules and standard paths). This informs the user about a potential
"header shadowing" issue and clarifies which header path was actually
used.

Since header searching is much cheaper than file loading, the added
overhead should be within an acceptable range -- assuming the diagnostic
message is valuable.
Fix wrong mask type that used by G_VSLIDEDOWN_VL.
Currently, DebugCounters work by creating a unique counter ID during
registration, and then using that ID to look up the counter information
in the global registry.

However, this means that anything working with counters has to always go
through the global instance. This includes the fast path that checks
whether any counters are enabled.

Instead, we can drop the counter IDs, and make the counter variables use
CounterInfo themselves. We can then directly check whether the specific
counter is active without going through the global registry. This is
both faster for the fast-path where all counters are disabled, and also
faster for the case where only one counter is active (as the fast-path
can now still be used for all the disabled counters).

After this change, disabled counters become essentially free at runtime,
and we should be able to enable them in non-assert builds as well.
…tions (llvm#169269)

This commit fixes two crashes in the `-remove-dead-values` pass related
to private functions.

Private functions are considered entirely "dead" by the liveness
analysis, which drives the `-remove-dead-values` pass.

The `-remove-dead-values` pass removes dead block arguments from private
functions. Private functions are entirely dead, so all of their block
arguments are removed. However, the pass did not correctly update all
users of these dropped block arguments.

1. A side-effecting operation must be removed if one of its operands is
dead. Otherwise, the operation would end up with a NULL operand. Note:
The liveness analysis would not have marked an SSA value as "dead" if it
had a reachable side-effecting users. (Therefore, it is safe to erase
such side-effecting operations.)
2. A branch operation must be removed if one of its non-forwarded
operands is dead. (E.g., the condition value of a `cf.cond_br`.)
Whenever a terminator is removed, a `ub.unrechable` operation is
inserted. This fixes llvm#158760.
This function is called from various .cpp files under `TargetBuiltins/`,
and was moved unintentionally into `AMDGPU.cpp` in PR llvm#132252. Move it
to a common place.
Fix the propagation added in commit 0d490ae to include all redecls,
not only previous ones. This fixes another instance of the assertion
"Cannot get layout of forward declarations" in getASTRecordLayout().

Kudos to Alexander Kornienko for providing an initial version of the
reproducer that I further simplified.

Fixes llvm#170084
…t stubs (llvm#170426)

These stubs (from 4bdf1aa) don’t actually override anything.
Removing them eliminates the need for a local getMemIntrinsicCost()
forwarder in llvm#169885.
…lvm#151944)

Introduce MO_LaneMask as new machine operand type. This can be used to
hold liveness infomation at sub-register granularity for register-type
operands. We also introduce a new COPY_LANEMASK instruction that uses
MO_lanemask operand to perform partial copy from source register
opernad.

One such use case of MO_LaneMask can be seen in llvm#151123, where it can be
used to store live regUnits information corresponding to the source
register of the COPY instructions, later can be used during CopyPhysReg
expansion.
…semblyInstEmulation (llvm#169631)

This will reduce the diff in subsequent patches

Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
llvm#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation llvm#169631
[lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633
commit-id:5e758a22
…170170)

Fixes llvm#154772
We previously set `ptx_kernel` for all kernels. But it's incorrect to
add `ptx_kernel` to the stub version of kernel introduced in llvm#115821.
This patch copies the workaround of AMDGPU.
…vm#170224)

The parser now correctly handles:
- abi_tags attached to operator<<: `operator<<[abi:SOMETAG]`
- abi_tags with "operator" as the tag name: `func[abi:operator]`
Separate out float wave-reduce intrinsic tests from the overloaded call.
Moved float add/sub/min/max ops from:
`llvm.amdgcn.reduce.add/sub/min/max` to
`llvm.amdgcn.reduce.fadd/fsub/fmin/fmax`.
This will allow the instruction emulation unwinder to reason about
instructions that prevent the subsequent instruction from executing.

Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
llvm#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation llvm#169631
[lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633

commit-id:bb5df4aa
…#169832)

Closes [llvm#169677](llvm#169677)

---------

Co-authored-by: EugeneZelenko <eugene.zelenko@gmail.com>
- Following llvm#168029. This is a step toward a unified interface for
masked/gather-scatter/strided/expand-compress cost modeling.
- Replace the ad-hoc parameter list with a single attributes object.

API change:
```
- InstructionCost getStridedMemoryOpCost(unsigned Opcode, Type *DataTy, const Value *Ptr,
                                                                       bool VariableMask, Align Alignment,
                                                                       TTI::TargetCostKind CostKind,
                                                                       const Instruction *I = nullptr);
+ InstructionCost getStridedMemoryOpCost(MemIntrinsicCostAttributes,
+                                                                      CostKind);
```

Notes:
- NFCI intended: callers populate MemIntrinsicCostAttributes with same
information as before.
This should match exactly the llvm attributes generated by classic
flang.
…hFlat (llvm#170274)

BUF instructions can access the scratch address space, so
SIInsertWaitCnt needs to be able
to track the SCRATCH_WRITE_ACCESS event for such BUF instructions.

The release-vgprs.mir test had to be updated because BUF instructions
w/o a MMO are now
tracked as a SCRATCH_WRITE_ACCESS. I added a MMO that touches global to
keep the test result unchanged. I also added a couple of testcases with no MMO to test the corrected behavior.
… lowering (llvm#169039)

So far, memcpy with known size, memcpy with unknown size, memmove with known
size, and memmove with unknown size have individual optimized loop lowering
implementations, while memset and memset.pattern use an unoptimized loop
lowering. This patch extracts the parts of the memcpy lowerings (for known and
unknown sizes) that generate the control flow for the loop expansion into an
`insertLoopExpansion` function. The `createMemCpyLoop(Unk|K)nownSize` functions
then only collect the necessary arguments for `insertLoopExpansion`, call it,
and fill the generated loop basic blocks.

The immediate benefit of this is that logic from the two memcpy lowerings is
deduplicated. Moreover, it enables follow-up patches that will use
`insertLoopExpansion` to optimize the memset and memset.pattern implementations
similarly to memcpy, since they can use the exact same control flow patterns.

The test changes are due to more consistent and useful basic block names in the
loop expansion and an improvement in basic block ordering: previously, the
basic block that determines if the residual loop is executed would be put at
the end of the function, now it is put before the residual loop body.
Otherwise, the generated code should be equivalent.

This patch doesn't affect memmove; deduplicating its logic would also be nice,
but to extract all CF generation from the memmove lowering,
`insertLoopExpansion` would need to be able to also create code that iterates
backwards over the argument buffers. That would make `insertLoopExpansion` a
lot more complex for a code path that's only used for memmove, so it's probably
not worth refactoring.

For SWDEV-543208.
EmitC currently models C's `&` and `*` operators via its `apply` op,
which has several drawbacks:

- Its pre-lvalue semantics combines dereferencing with memory access.

- Representing multiple opcodes (selected by an attribute) in a single
op complicates the code by adding a second, attribute-based selection
layer on top of MLIR's standard `isa<>` mechanism.

This patch adds two distinct, lvalue-based ops to model these C operators.
EmitC passes were converted to use the new ops instead of `apply`, which
is now deprecated.
…lvm#170034)

Closes llvm#169166

---------

Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
…#169633)

This allows the unwinder to handle code with mid-function epilogues
where the subsequent code is reachable through a backwards branch.

Two changes are required to accomplish this:

1. Do not enqueue the subsequent instruction if the current instruction
   is a barrier(*).
2. When processing an instruction, stop ignoring branches with negative
   offsets.

(*) As per the definition in LLVM's MC layer, a barrier is any
instruction that "stops control flow from executing the instruction
immediately following it". See `MCInstrDesc::isBarrier` in MCInstrDesc.h

Part of a sequence of PRs:
[lldb][NFCI] Rewrite UnwindAssemblyInstEmulation in terms of a CFG visit
llvm#169630
[lldb][NFC] Rename forward_branch_offset to branch_offset in
UnwindAssemblyInstEmulation llvm#169631
[lldb] Add DisassemblerLLVMC::IsBarrier API llvm#169632
[lldb] Handle backwards branches in UnwindAssemblyInstEmulation llvm#169633

commit-id:fd266c13
This patch adds the documentation for ScriptedFrameProviders to the
lldb website.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…ointSite (llvm#169799)

Suppose two threads are performing the exact same step out plan. They
will both have an internal breakpoint set at their parent frame. Now
supposed both of those breakpoints are in the same address (i.e. the
same BreakpointSite).

At the end of `ThreadPlanStepOut::DoPlanExplainsStop`, we see this:

```
// If there was only one owner, then we're done.  But if we also hit
// some user breakpoint on our way out, we should mark ourselves as
// done, but also not claim to explain the stop, since it is more
// important to report the user breakpoint than the step out
// completion.

if (site_sp->GetNumberOfConstituents() == 1)
  return true;
```

In other words, the plan looks at the name number of constituents of the
site to decide whether it explains the stop, the logic being that a
_user_ might have put a breakpoint there. However, the implementation is
not correct; in particular, it will fail in the situation described
above. We should only care about non-internal breakpoints that would
stop for the current thread.

It is tricky to test this, as it depends on the timing of threads, but I
was able to consistently reproduce the issue with a swift program using
concurrency.

rdar://165481473
The code for this commit was taken with minimal modification to fit LLVM
style from llvm-project/orc-rt/include/CallableTraitsHelper.h and
llvm-project/orc-rt/unittests/CallableTraitsHelperTest.cpp (originally
commited in 40fce32)

CallableTraitsHelper identifies the return type and argument types of a
callable type and passes those to an implementation class template to
operate on. E.g. the CallableArgInfoImpl class exposes these types as
typedefs.

Porting CallableTraitsHelper from the new ORC runtime will allow us to
simplify existing and upcoming "callable-traits" classes in ORC.
…ugh a fma with multiple constants (llvm#170458)

Despite 2 of the 3 arguments of the fma intrinsics calls being constant
(free shuffle), foldShuffleOfIntrinsics fails to fold the shuffle
through
mstorsjo and others added 7 commits December 3, 2025 13:09
…I=RegF=0, CR!=1 (llvm#170294)

In these cases, there are no other GPRs or float registers that would
have been backed up before the register homing area, that would have
allocated space on the stack for the saved registers.

Normally, the register homing part of the prologue consists of 4 nop
unwind codes. However, if we haven't allocated stack space for those
arguments yet, there's no space to store them in. The previous printout,
printing "stp x0, x1, [sp, #-N]!" wouldn't work when interpreted as a
nop unwind code.

Based on "dumpbin -unwindinfo", and from empirical inspection with
RtlVirtualUnwind, it turns out that the homing of argument registers is
done outside of the prologue. In these cases, "dumpbin -unwindinfo"
prints an annotation "(argument registers homed post-prolog)".

Adjust the printout accordingly. In these cases, the later stack
allocation (either "stp x29, x30, [sp, #-LocSZ]! or "sub sp, sp,
#LocSZ") is adjusted to include the space the homed registers (i.e. be
the full size from FrameSize).
This patch adds the documentation for ScriptedFrameProviders to the
lldb website.

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…m#170449)

This is useful since we can highlight the opcode that OpPC points to.
@ronlieb ronlieb requested review from a team and dpalermo December 3, 2025 12:01
@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit 5b3c422 into amd-staging Dec 3, 2025
11 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251203053953 branch December 3, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.