Skip to content

Fix hoist-static-allocs when function parameters are returned #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3,528 commits into
base: main
Choose a base branch
from

Conversation

Menooker
Copy link
Owner

buffer-results-to-out-params pass will have a nullptr error when hoist-static-allocs option is on, when the return value of a function is a parameter of the function. This PR fixes this issue and let the pass remove the return value in the ReturnOp when

  • the value type is memref
  • and the value is a function parameter

@Menooker
Copy link
Owner Author

@ZhennanQin @xurui1995 Hi, this is a bug fix of upstream MLIR. The bug was found by @xurui1995. Would you please help review it? Thanks!

Copy link

@xurui1995 xurui1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dfszabo and others added 28 commits August 29, 2024 22:53
)

GCC supports code like "asm volatile ("" : "=r" (i) : "0" (f))" where i
is integer type and f is floating point type. Currently this code
produces an error with Clang. The change allows mixed scalar types
between input and output constraints.

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
…lvm#106453)

The primary motivation behind this is to allow the enum type to be
referred to earlier in the Sema.h file which is needed for llvm#106321.

It was requested in llvm#106321 that a scoped enum be used (rather than
moving the enum declaration earlier in the Sema class declaration).
Unfortunately doing this creates a lot of churn as all use sites of the
enum constants had to be changed. Appologies to all downstream forks in
advanced.

Note the AA_ prefix has been dropped from the enum value names as they
are now redundant.
Summary:
This provides the `_l` variants for the `stdlib.h` functions. These are
just copies of the same entrypoint and don't do anything with the locale
information.
Summary:
This adds the locale variants of the string functions. As previously,
these do not use the locale information at all and simply copy the
non-locale version which expects the "C" locale.
…#106469)

add true16/fake16 flag to gfx12 dasm tests including vop1, vop1_dpp,
vop3_from_vop1 and vop3_from_vop1_dpp. This is a test only change.
We can do slightly better on the neutral value when we have nsz.
…ectorized.

Need to consider the maximum type size in the graph before doing attempt
for the vectorization of non-power-of-2 number of elements, which may be
  less than MinVF.
assume all functions used in a HWASan module potentially touch shadow
memory (and short granules).
…ion (llvm#106568)

As reported in
llvm#105648 (comment)
commit 08ad8dc7154bf3ab79f750e6d5fb7df597c7601a
introduced a nullptr dereference in the case when store contains a
binding to a symbol that has no origin region associated with it, such
as the symbol generated when a pointer is passed to an opaque function.
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.

Fixes llvm#106417.
…itializerToDecl(). (llvm#106235)

Control flow analysis performed by a static analysis tool revealed the
potential for null pointer dereferences to occur in conjunction with the
`Init` parameter in `Sema::AddInitializerToDecl()`. On entry to the
function, `Init` is required to be non-null as there are multiple
potential branches that unconditionally dereference it. However, there
were two places where `Init` is compared to null thus implying that
`Init` is expected to be null in some cases. These checks appear to be
purely defensive checks and thus unnecessary. Further, there were
several cases where code checked `Result`, a variable of type
`ExprResult`, for an invalid value, but did not check for a valid but
null value and then proceeded to unconditionally dereference the
potential null result. This change elides the unnecessary defensive
checks and changes some checks for an invalid result to instead branch
on an unusable result (either an invalid result or a valid but null
result).
…vm#106496)

These combines call getNumElements() which isn't valid for scalable
vectors.
One of the tests for the new fake use intrinsic are failing on darwin
buildbots due to relying on behaviour for their expected triple; this
commit adds explicit triples to the few remaining fake-use tests that
didn't have them.

Fixes commit 3d08ade (llvm#86149).

Buildbot failures:
https://lab.llvm.org/buildbot/#/builders/23/builds/2505
…m#106335)

As agreed on itanium-cxx-abi/cxx-abi#109 these
placeholders should be mangled as a `template-prefix` production.

```
    <template-prefix> ::= <template unqualified-name>           # global template
                      ::= <prefix> <template unqualified-name>  # nested template
                      ::= <template-param>                      # template template parameter
                      ::= <substitution>
```

Previous to this patch, the template template parameter case was not
handled, and template template parameters were incorrectly being handled
as unqualified-names.

Before llvm#95202, DeducedTemplateType was not canonicalized correctly, so
that template template parameter declarations were retained
uncanonicalized.

After llvm#95202, they are correctly canonicalized, but this now leads to
these TTPs being anonymous entities, where the mangling implementation
correctly doesn't expect an anonymous declaration of this kind, leading
to a crash.

Fixes llvm#106182.
…t internal shell syntax (llvm#105902)

This patch rewrites a test that uses command substitution `$()` and the
`stat` command, which are not supported by lit's internal shell. Instead
of using this syntax to perform the file size comparison done in this
test, a Python script is used instead to perform the same operation.

Fixes llvm#102384.
This preserves the semantis of fneg and matches what we do in
LegalizeDAG.

I kept the legal FSUB check to force unrolling for some targets that
don't have FSUB but have XOR. On Aarch64, using xor broke some tests that
expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg
(extractvectorelt X)))))) pattern.
…und.

If the value is used in Scalar several times, the first attempt to find
its position in the node (if ReuseShuffleIndices and ReorderIndices not
empty) may fail. In this case need to find another copy of the same
value and try again.
Fixes llvm#106626
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes
should be a first-class driver option. Notes to this effect have been added in 
the appropriate places in both flang and clang.

---------

Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
…vm#106600)

`uitofp` and `sitofp` instructions use the default rounding mode which
is defined as round-to-nearest.
The interceptor types are supposed to match size_t (and the non-Windows
ssize_t) exactly, but on 32-bit Windows `size_t` uses `unsigned int`
whereas `SIZE_T` is `unsigned long`. The current definition results in
`uptr` not matching `uintptr_t` since we otherwise get typedef
redefinition errors. Work around this by using a #define instead of
a typedef when defining SIZE_T.

It would probably be cleaner to stop using these uppercase types, but
that is a rather invasive change and this one is the minimal change to
allow uptr to match uintptr_t on Windows.

To ensure this compiles on Windows, we also remove the interceptor.h
defines of uptr (that do not always match __sanitizer::uptr) and rely
on __sanitizer::uptr instead. The interceptor types most likely predate
those other types so clean up the unnecessary definition while here.

This also reverts commit 18e06e3 and
commit bb27dd8.

Reviewed By: mstorsjo, vitalybuka

Pull Request: llvm#106311
philnik777 and others added 25 commits September 1, 2024 13:39
)

This doesn't seem to have any use other than the possibility of merge
conflicts and accidentally forgetting to update `NUM_PREDEF_DECL_IDS`.
When the shuffle masks are `PoisonMaskElem`, there is not need to check
the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause
the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) &&
"SK_ExtractSubvector index out of range"' failed.
llvm#106889)

We can't assume closed world even in full LTO post-link stage. It is
only true
if we are building a "GPU executable". However, AMDGPU does support
"dyamic
library". I'm not aware of any approach to tell if it is relocatable
link when
we create the pass. For now let's revert the patch as it is currently
breaking things.
We can re-enable it once we can handle it correctly.
NEON has non-IEEE compliant denormal flushing and the compiler should
check if it safe to vectorize instructions for NEON in non-fast math
mode.

Fixes llvm#106909
Implement cost computation for VPWidenCallRecipe. In some cases, targets
use argument info to compute intrinsic costs. If all operands of the
call are VPValues with an underlying IR value, use the IR values as
arguments.

PR: llvm#106731
This patch reduces the memory usage for import lists by employing
memory-efficient data structures.

With this patch, an import list for a given destination module is
basically DenseSet<uint32_t> with each element indexing into the
deduplication table containing tuples of:

  {SourceModule, GUID, Definition/Declaration}

In one of our large applications, the peak memory usage goes down by
9.2% from 6.120GB to 5.555GB during the LTO indexing step.

This patch addresses several sources of space inefficiency associated
with std::unordered_map:

- std::unordered_map<GUID, ImportKind> takes up 16 bytes because of
  padding even though ImportKind only carries one bit of information.

- std::unordered_map uses pointers to elements, both in the hash table
  proper and for collision chains.

- We allocate an instance of std::unordered_map for each
  {Destination Module, Source Module} pair for which we have at least
  one import.  Most import lists have less than 10 imports, so the
  metadata like the size of std::unordered_map and the pointer to the
  hash table costs a lot relative to the actual contents.
We have special handling for this in type legalization, but we
didn't have a test.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.

This fixes another divergence between legacy and VPlan-based cost model.

Fixes llvm#106780.
This is for an upcoming change to the threshold on Apple targets for
using a constant pool for FP literals versus building them with integer
moves.

This file is based on literal_pools_float.ll. I tried to bolt on to the
existing test, but it got messy as that file is already testing a matrix
of combinations, so creating this new file instead.
…=` (llvm#106803)

Time trace profiler support was added into LLVMgold in
cd3255a. This patch adds its
`-plugin-opt` counterpart, which is just an alias to `--time-trace=`,
into LLD for compatibility.
Widening/narrowing the source data type to match the destination data
type may require multiple steps.
To model the costs, the patch generated the interim type by following
the logic in RISCVTargetLowering::lowerVPFPIntConvOp.
`HighMask` is the value that sets bits from `Msb+1` to 63 to 1, while
the other bits are set to 0.
Currently the option prints a path to a nonexistent directory with the
full triple, `lib/powerpc64-ibm-aix7.2.0.0`. It should only be
`lib/aix`.
…d. NFC (llvm#106671)

This predicate isn't bound to the scheduler model and and we may want to
reuse it in the future. We already moved it to reuse it in our
downstream.
…m#106890)

`FindInstantiatedDecl()` relies on the `CurContext` to find the
corresponding class template instantiation for a class template
declaration.

Previously, we pushed the semantic declaration context for constraint
comparison, which is incorrect for constraints on friend declarations.
In issue llvm#78101, the semantic context of the friend is the TU, so we
missed the implicit template specialization `Template<void, 4>` when
looking for the instantiation of the primary template `Template` at the
time of checking the member instantiation; instead, we mistakenly picked
up the explicit specialization `Template<float, 5>`, hence the error.

As a bonus, this also fixes a crash when diagnosing constraints. The
DeclarationName is not necessarily an identifier, so it's incorrect to
call `getName()` on e.g. overloaded operators. Since the
DiagnosticBuilder has correctly handled Decl printing, we don't need to
find the printable name ourselves.

Fixes llvm#78101
Menooker pushed a commit that referenced this pull request Sep 3, 2024
…104523)

Compilers and language runtimes often use helper functions that are
fundamentally uninteresting when debugging anything but the
compiler/runtime itself. This patch introduces a user-extensible
mechanism that allows for these frames to be hidden from backtraces and
automatically skipped over when navigating the stack with `up` and
`down`.

This does not affect the numbering of frames, so `f <N>` will still
provide access to the hidden frames. The `bt` output will also print a
hint that frames have been hidden.

My primary motivation for this feature is to hide thunks in the Swift
programming language, but I'm including an example recognizer for
`std::function::operator()` that I wished for myself many times while
debugging LLDB.

rdar://126629381


Example output. (Yes, my proof-of-concept recognizer could hide even
more frames if we had a method that returned the function name without
the return type or I used something that isn't based off regex, but it's
really only meant as an example).

before:
```
(lldb) thread backtrace --filtered=false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12
    frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10
    frame llvm#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12
    frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame llvm#8: 0x0000000183cdf154 dyld`start + 2476
(lldb) 
```

after

```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame llvm#8: 0x0000000183cdf154 dyld`start + 2476
Note: Some frames were hidden by frame recognizers
```
Menooker pushed a commit that referenced this pull request Sep 3, 2024
`JITDylibSearchOrderResolver` local variable can be destroyed before
completion of all callbacks. Capture it together with `Deps` in
`OnEmitted` callback.

Original error:

```
==2035==ERROR: AddressSanitizer: stack-use-after-return on address 0x7bebfa155b70 at pc 0x7ff2a9a88b4a bp 0x7bec08d51980 sp 0x7bec08d51978
READ of size 8 at 0x7bebfa155b70 thread T87 (tf_xla-cpu-llvm)
    #0 0x7ff2a9a88b49 in operator() llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:58
    #1 0x7ff2a9a88b49 in __invoke<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:149:25
    #2 0x7ff2a9a88b49 in __call<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:224:5
    #3 0x7ff2a9a88b49 in operator() libcxx/include/__functional/function.h:210:12
    #4 0x7ff2a9a88b49 in void std::__u::__function::__policy_invoker<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr,
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.