forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
[pull] main from llvm:main #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pull
wants to merge
343
commits into
tanji-dg:main
Choose a base branch
from
llvm:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+41,425
−13,686
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When folding SQRT(), notice invalid argument exceptions and optionally warn about them.
Assumed-type dummy argument symbols s are never packaged in DataRefs since the only way they can be used in Fortran is by forwarded as actual arguments to other calls. When an ActualArgument comprising a forwarded assumed-type dummy argument is presented to ExtractDataRef, it fails, because ExtractDataRef for ActualArgument only handles actual argument expressions (including variable references). Add support for actual arguments that are assumed-type dummy arguments. Fixes #168978.
#170202) This adds a new virtual method `GetScriptedModulePath()` to `ScriptedInterface` that allows retrieving the file path of the Python module containing the scripted object implementation. The Python implementation acquires the GIL and walks through the object's `__class__.__module__` to find the module's `__file__` attribute. This will be used by ScriptedFrame to populate the module and compile unit for frames pointing to Python source files. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…168977) If the high bits are assumed 0 for the cast, use zext. Previously we would emit a build_vector and a bitcast with the high element as 0. The zext is more easily optimized. I'm less convinced this is good for globalisel, since you still need to have the inttoptr back to the original pointer type. The default value is 0, though I'm not sure if this is meaningful in the real world. The real uses might always override the high bit value with the attribute.
This PR upstreams the emission of the `cir.await` resume branch. Handling the case where the return value of `co_await` is not ignored is deferred to a future PR, which will be added once `co_return` is upstreamed. Additionally, the `forLValue` variable is always `false` in the current implementation. When support for emitting `coro_yield` is added, this variable will be set to `true`, so that work is also deferred to a future PR.
The `bitfield_insert` function in the OpenCL C library had an incorrect `__CLC_BODY` definition, that included the `.inc` file for the `__clc_bitfield_insert` declaration instead of the correct implementation. So, the function was not defined at all, leading to linker errors when trying to use it.
…fety soft traps (#169117) This patch tries to upstream code landed downstream in swiftlang#11835. This patch implements an instrumentation plugin for the `-fbounds-safety` soft trap mode first implemented in swiftlang#11645 (rdar://158088757). That functionality isn't supported in upstream Clang yet, however the instrumented plugin can be compiled without issue so this patch tries to upstream it. The included tests are all disabled when the clang used for testing doesn't support `-fbounds-safety`. This means the tests will be skipped. However, it's fairly easy to point LLDB at a clang that does support `-fbounds-safety. I've done this and confirmed the tests pass. To use a custom clang the following can be done: * For API tests set the `LLDB_TEST_COMPILER` CMake cache variable to point to appropriate compiler. * For shell tests applying a patch like this can be used to set the appropriate compiler: ``` --- a/lldb/test/Shell/helper/toolchain.py +++ b/lldb/test/Shell/helper/toolchain.py @@ -271,6 +271,7 @@ def use_support_substitutions(config): if config.lldb_lit_tools_dir: additional_tool_dirs.append(config.lldb_lit_tools_dir) + config.environment['CLANG'] = '/path/to/clang' llvm_config.use_clang( ``` The current implementation of -fbounds-safety traps works by emitting calls to runtime functions intended to log the occurrence of a soft trap. While the user could just set a breakpoint of these functions the instrumentation plugin sets it automatically and provides several additional features: When debug info is available: * It adjusts the stop reason to be the reason for trapping. This is extracted from the artificial frame in the debug info (similar to -fbounds-safety hard traps). * It adjusts the selected frame to be the frame where the soft trap occurred. When debug info is not available: * For the `call-with-str` soft trap mode the soft trap reason is read from the first argument register. * For the `call-minimal` soft trap mode the stop reason is adjusted to note its a bounds check failure but does not give further information because none is available. * In this situation the selected frame is not adjusted because in this mode the user will be looking at assembly and adjusting the frame makes things confusing. This patch includes shell and api tests. The shell tests seemed like the best way to test behavior when debug info is missing because those tests make it easy to disable building with debug info completely. rdar://163230807
COMPILER_RT_STANDALONE_BUILD is set when doing a bootstrapping build through LLVM_ENABLE_RUNTIMES with the CMake source directory being in llvm/. This patch changes the XRay tests to also detect that we have LLVM sources and the llvm-xray tool if we are in a bootstrapping build through the use of the LLVM_TREE_AVAILABLE variable which is set in runtimes/CMakeLists.txt.
…dencies (#170226) This change adds tracking of the StackFrameList that produced each frame by storing a weak pointer (m_frame_list_wp) in both `StackFrame` and `ExecutionContextRef`. When resolving frames through `ExecutionContextRef::GetFrameSP`, the code now first attempts to use the remembered frame list instead of immediately calling `Thread::GetStackFrameList`. This breaks circular dependencies that can occur during frame provider initialization, where creating a frame provider might trigger `ExecutionContext` resolution, which would then call back into `Thread::GetStackFrameList()`, creating an infinite loop. The `StackFrameList` now sets m_frame_list_wp on every frame it creates, and a new virtual method `GetOriginatingStackFrameList` allows frames to expose their originating list. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The test case mismatch was introduced in #135727
Starlark is perfectly capable of doing what we need and this avoids the dependency on a host Python
- Remove unused 'CHECK' from the CUDASema test
In f8182f1, we add support for printing "null" aliasee in AsmWriter, but missing support in LLParser.
Support CIR codegen for x86 builtin vec_set.
This PR fixes an issue in `ConcatOpInterface` where `tensor.concat` fails when the concat dimension is dynamic while the result type is static. The fix unifies the computation by using `OpFoldResult`, avoiding the need to separately handle dynamic and static dimension values. Fixes #162776.
This change enables the LoadStoreVectorizer to merge and vectorize
contiguous chains even when their scalar element types differ, as long
as the total bitwidth matches. To do so, we rebase offsets between
chains, normalize value types to a common integer type, and insert the
necessary casts around loads and stores. This uncovers more
vectorization opportunities and explains the expected codegen updates
across AMDGPU tests.
Key changes:
- Chain merging
- Build contiguous subchains and then merge adjacent ones when:
- They refer to the same underlying pointer object and address space.
- They are either all loads or all stores.
- A constant leader-to-leader delta exists.
- Rebasing one chain into the other's coordinate space does not overlap.
- All elements have equal total bit width.
- Rebase the second chain by the computed delta and append it to the
first.
- Type normalization and casting
- Normalize merged chains to a common integer type sized to the total
bits.
- For loads: create a new load of the normalized type, copy metadata,
and cast back to the original type for uses if needed.
- For stores: bitcast the value to the normalized type and store that.
- Insert zext/trunc for integer size changes; use bit-or-pointer casts
when sizes match.
- Cleanups
- Erase replaced instructions and DCE pointer operands when safe.
- New helpers: computeLeaderDelta, chainsOverlapAfterRebase,
rebaseChain, normalizeChainToType, and allElemsMatchTotalBits.
Impact:
- Increases vectorization opportunities across mixed-typed but
size-compatible access chains.
- Large set of expected AMDGPU codegen diffs due to more/changed
vectorization.
This PR resolves #97715.
This is an alternative to #159500, breaking that PR down into three separate PRs, to make it easier to review. This first PR of the three adds the basic framework for doing type casing to the DIL code, but it does not actually do any casting: In this PR the DIL parser only recognizes builtin type names, and the DIL interpreter does not do anything except return the original operand (no casting). The second and third PRs will add most of the type parsing, and do the actual type casting, respectively.
Extends the work started in #165714 by supporting team reductions. Similar to what was done in #165714, this PR introduces proper allocations, loads, and stores for by-ref reductions in teams-related callbacks: * `_omp_reduction_list_to_global_copy_func`, * `_omp_reduction_list_to_global_reduce_func`, * `_omp_reduction_global_to_list_copy_func`, and * `_omp_reduction_global_to_list_reduce_func`.
We often don't need the APSInt at all, so add a version that pops the integral from the stack and just static_casts to uint64_t.
This test fails now that it actually runs: ld.lld: error: undefined symbol: std::__throw_system_error(int)
The recent changes in the MLIR TableGen interface for generated
OpTy::build functions involves a new OpTy::create function that is
generated passing arguments without forwarding. This is problematic with
arguments that are move only such as `std::unique_ptr`. My particular
use case involves `std::unique_ptr<mlir::Region>` which is desirable as
the `mlir::OperationState` object accepts calls to
`addRegion(std::unique_ptr<mlir::Region>`.
In Discord, the use of `extraClassDeclarations` was suggested which I
may go with regardless since I still have to define the builder function
anyways, but perhaps you would consider this trivial change as it
supports a broader class of argument types for this approach.
Consider the declaration in TableGen:
```
let builders = [
OpBuilder<(ins "::mlir::Value":$cdr,
"::mlir::ValueRange":$packs,
"std::unique_ptr<::mlir::Region>":$body)>
];
```
Which currently generates:
```cpp
ExpandPacksOp ExpandPacksOp::create(::mlir::OpBuilder &builder, ::mlir::Location location, ::mlir::Value cdr, ::mlir::ValueRange packs, std::unique_ptr<::mlir::Region> body) {
::mlir::OperationState __state__(location, getOperationName());
build(builder, __state__, std::forward<decltype(cdr)>(cdr), std::forward<decltype(packs)>(packs), std::forward<decltype(body)>(body));
auto __res__ = ::llvm::dyn_cast<ExpandPacksOp>(builder.create(__state__));
assert(__res__ && "builder didn't return the right type");
return __res__;
}
```
With this change it will generate:
```cpp
ExpandPacksOp ExpandPacksOp::create(::mlir::OpBuilder &builder, ::mlir::Location location, ::mlir::Value cdr, ::mlir::ValueRange packs, std::unique_ptr<::mlir::Region>&&body) {
::mlir::OperationState __state__(location, getOperationName());
build(builder, __state__, static_cast<decltype(cdr)>(cdr), std::forward<decltype(packs)>(packs), std::forward<decltype(body)>(body));
auto __res__ = ::llvm::dyn_cast<ExpandPacksOp>(builder.create(__state__));
assert(__res__ && "builder didn't return the right type");
return __res__;
}
```
Another option could be to make this function a template but then it
would not be hidden in the generated translation unit. I don't know if
that was the original intent. Thank you for your consideration.
…ebsite" This reverts commit bfde296.
This patch adds the documentation for ScriptedFrameProviders to the lldb website. Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
…0449) This is useful since we can highlight the opcode that OpPC points to.
…9980) This patch implements constant evaluation support for the following X86 intrinsics: - _mm_cvtpd_ps, _mm256_cvtpd_ps (Packed Double to Float) - _mm_cvtsd_ss (Scalar Double to Float merge) - Masked variants of the above It implements the strict "Exact and Finite" rule: conversions that are inexact, infinite, or NaN are rejected in constexpr contexts. Fixes #169370
…rs (#170315) The "half spanning" legalization of extract_subvector is only valid for fixed-length vectors. This patch disables it for scalable vectors and makes more careful use of ElementCount in the lowering. Fixes regression from #154101, which was encountered here: #166748 (comment) Note: We could optimize this case given the known vscale, but this patch only attempts to fix the miscompile.
We've had internal test failures since #166188 landed. The root cause is that `PPChainedCallbacks::EmbedFileNotFound()` incorrectly calls `PPCallbacks::FileNotFound()` not `PPCallbacks::EmbedFileNotFound()`.
There are two fixes: 1. Clear kill flags for `FalseReg` in foldVMergeToMask or we can't pass the MachineVerifier because of using a killed virtual register. 2. Restrict `lookThruCopies` to only look through COPYs with one non-debug use. This was found when backporting #170070 to 21.x branch.
…insics. (#161840) This is followup patch to #157680 . In this patch, we are adding explicit bitcasts to floating-point type when lowering saturating add/sub and shift NEON scalar intrinsics using SelectionDAG, so they can be picked up by patterns added in first part of this series. To do that, we have to create new nodes for these intrinsics, which operate on floating-point types and wrap them in bitcast nodes.
Turn off "in EQUIVALENCE" check for processing of array subscripts, since subscripts themselves are not part of the EQUIVALENCE. Fixes #169590
…170370) The convention is to change spelling from snake_case to UpperCamel, and use the result as a stem in derived names, e.g. - spelling is "some_clause" -> stem is SomeClause - spelling is "someclause" -> stem is Someclause Member of the OmpClause variant is <stem> itself, e.g. Looprange as in parser::OmpClause::Looprange. Specific clause class name is Omp<stem>Clause, e.g. OmpLooprangeClause.
To be able to test lit without having a configuration of LLVM, we need to support invocations that are not going through the lit.site.cfg and thus don't have a llvm_config set-up.
before, with the options `AlignConsecutiveDeclarations` and
`AlignConsecutiveAssignments` enabled
```C++
veryverylongvariablename = somethingelse;
shortervariablename = anotherverylonglonglongvariablename + //
somevariablethatwastoolongtofitonthesamerow;
double i234 = 0;
auto v = false ? type{}
: type{
1,
};
```
after
```C++
veryverylongvariablename = somethingelse;
shortervariablename = anotherverylonglonglongvariablename + //
somevariablethatwastoolongtofitonthesamerow;
double i234 = 0;
auto v = false ? type{}
: type{
1,
};
```
Fixes #126873.
Fixes #57612.
Previously, the part for determining whether aligning a line should move
the next line relied on having a pair of tokens such as parentheses
surrounding both lines. There are often no such tokens. For example in
the first block above. This patch removes the requirement for those
tokens.
Now the program keeps track of how the position is calculated. The
alignment step moves the next line if its position is based on a column
to the right of the token that gets aligned.
The column that the position of the line is based on is more detailed
than the `IsAligned` property that the program used before this patch.
It enables the program to handle cases where parts that should not
usually move with the previous line and parts that should are nested
like in the second block above. That is why the patch uses it instead of
fake parentheses.
In the sample below, the `private` identifier is the name of the type,
and the `try` identifier is the name of the variable.
new
```SystemVerilog
begin
private try;
end
```
old
```SystemVerilog
begin
private
try
;
end
```
…or enum class (#168092) Fixes #163224 --- This patch addresses the issue by correcting the caret insertion location for attributes incorrectly positioned before an enum. The location is now derived from the associated `EnumDecl`: for named enums, the attribute is placed before the identifier, while for anonymous enum definitions, it is placed before the opening brace, with a fallback to the semicolon when no brace is present. For example: ```cpp [[nodiscard]] enum class E1 {}; ``` is now suggested as: ```cpp enum class [[nodiscard]] E1 {}; ```
… in SymbolTableTest.cpp (NFC)
#132365) [MemoryBuiltins] Consider index type size when aggregating gep offsets Main goal here is to fix some bugs seen with LowerConstantIntrinsics pass and the lowering of llvm.objectsize. In ObjectSizeOffsetVisitor::computeImpl we are using an external analysis together with stripAndAccumulateConstantOffsets. The idea is to compute the Min/Max value of individual offsets within a GEP. The bug solved here is that when doing the Min/Max comparisons the external analysis wasn't considering the index type size (given by the data layout), it was simply using the type from the IR. Since a GEP is defined as sext/truncating indices we need to consider the index type size in the external analysis. This solves a regression (false ubsan warnings) seen after commit 02b8ee2 (#117849).
…IRTranslation.cpp (NFC)
This enables the use of debug counters in (non-assertion) release builds. This is useful to enable debugging without having to switch to an assertion-enabled build, which may not always be easy. After some recent improvements, always supporting debug counters no longer has measurable overhead.
… colon (#169246) Fixes #167905 --- This patch addresses an issue where invalid nested name specifier sequences containing a single colon (`a:c::`) could be treated during recovery as valid scope specifiers, which in turn led to a crash https://github.com/llvm/llvm-project/blob/c543615744d61e0967b956c402e310946d741570/clang/lib/Parse/ParseExprCXX.cpp#L404-L418 For malformed inputs like `a:c::`, the single colon recovery incorrectly triggers and produces an `annot_cxxscope`. When tentative parsing later runs https://github.com/llvm/llvm-project/blob/996213c6ea0dc2e47624c6b06c0833a882c1c1f7/clang/lib/Parse/ParseTentative.cpp#L1739-L1740 the classifier returns `Ambiguous`, which doesn't stop parsing. The parser then enters the https://github.com/llvm/llvm-project/blob/996213c6ea0dc2e47624c6b06c0833a882c1c1f7/clang/lib/Parse/ParseTentative.cpp#L1750-L1752 and consumes the invalid scope annotation, eventually reaching `EOF` and crashing.
#165590) Currently OptimizeLoopTermCond can only convert a cmp instruction to using a postincrement induction variable, which means it can't handle predicated loops where the termination condition comes from get_active_lane_mask. Relax this restriction so that we can handle any kind of instruction, though only if it's the instruction immediately before the branch (except for possibly an extractelement).
…170376) This is a slightly different API than ConstantRange's areInsensitiveToSignednessOfICmpPredicate. The only actual difference (beyond naming) is the handling of empty ranges (i.e. unreachable code). I wanted to keep the existing SCEV behavior for the unreachable code as we should be folding that to poison, not reasoning about samesign. I tried the other variant locally, and saw no test changes.
Index ops cause some issues during SIMT distribution because they don't have the `Elementwise` mappable trait. This PR replaces all index arithmetic ops with matching `arith` dialect ops.
Adds ShowMessageParams to LSP support according to the [LSP specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#showMessageRequestParams).
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )