forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Fix hoist-static-allocs when function parameters are returned #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Menooker
wants to merge
3,528
commits into
main
Choose a base branch
from
fix_hoist_static_allocs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@ZhennanQin @xurui1995 Hi, this is a bug fix of upstream MLIR. The bug was found by @xurui1995. Would you please help review it? Thanks! |
xurui1995
approved these changes
Jul 31, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
) GCC supports code like "asm volatile ("" : "=r" (i) : "0" (f))" where i is integer type and f is floating point type. Currently this code produces an error with Clang. The change allows mixed scalar types between input and output constraints. Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
…lvm#106453) The primary motivation behind this is to allow the enum type to be referred to earlier in the Sema.h file which is needed for llvm#106321. It was requested in llvm#106321 that a scoped enum be used (rather than moving the enum declaration earlier in the Sema class declaration). Unfortunately doing this creates a lot of churn as all use sites of the enum constants had to be changed. Appologies to all downstream forks in advanced. Note the AA_ prefix has been dropped from the enum value names as they are now redundant.
Summary: This provides the `_l` variants for the `stdlib.h` functions. These are just copies of the same entrypoint and don't do anything with the locale information.
Summary: This adds the locale variants of the string functions. As previously, these do not use the locale information at all and simply copy the non-locale version which expects the "C" locale.
…#106469) add true16/fake16 flag to gfx12 dasm tests including vop1, vop1_dpp, vop3_from_vop1 and vop3_from_vop1_dpp. This is a test only change.
We can do slightly better on the neutral value when we have nsz.
…ectors in IRTranslator. (llvm#106580)
…ectorized. Need to consider the maximum type size in the graph before doing attempt for the vectorization of non-power-of-2 number of elements, which may be less than MinVF.
assume all functions used in a HWASan module potentially touch shadow memory (and short granules).
…ion (llvm#106568) As reported in llvm#105648 (comment) commit 08ad8dc7154bf3ab79f750e6d5fb7df597c7601a introduced a nullptr dereference in the case when store contains a binding to a symbol that has no origin region associated with it, such as the symbol generated when a pointer is passed to an opaque function.
Reverts llvm#106565 Broke clang tests
This ensures we skip any instructions identified to be ignored by the legacy cost model as well. Fixes a divergence between legacy and VPlan-based cost model. Fixes llvm#106417.
…tor>> (llvm#106263) Fixes llvm#106261 rdar://133991190
…itializerToDecl(). (llvm#106235) Control flow analysis performed by a static analysis tool revealed the potential for null pointer dereferences to occur in conjunction with the `Init` parameter in `Sema::AddInitializerToDecl()`. On entry to the function, `Init` is required to be non-null as there are multiple potential branches that unconditionally dereference it. However, there were two places where `Init` is compared to null thus implying that `Init` is expected to be null in some cases. These checks appear to be purely defensive checks and thus unnecessary. Further, there were several cases where code checked `Result`, a variable of type `ExprResult`, for an invalid value, but did not check for a valid but null value and then proceeded to unconditionally dereference the potential null result. This change elides the unnecessary defensive checks and changes some checks for an invalid result to instead branch on an unusable result (either an invalid result or a valid but null result).
…vm#106496) These combines call getNumElements() which isn't valid for scalable vectors.
One of the tests for the new fake use intrinsic are failing on darwin buildbots due to relying on behaviour for their expected triple; this commit adds explicit triples to the few remaining fake-use tests that didn't have them. Fixes commit 3d08ade (llvm#86149). Buildbot failures: https://lab.llvm.org/buildbot/#/builders/23/builds/2505
…m#106335) As agreed on itanium-cxx-abi/cxx-abi#109 these placeholders should be mangled as a `template-prefix` production. ``` <template-prefix> ::= <template unqualified-name> # global template ::= <prefix> <template unqualified-name> # nested template ::= <template-param> # template template parameter ::= <substitution> ``` Previous to this patch, the template template parameter case was not handled, and template template parameters were incorrectly being handled as unqualified-names. Before llvm#95202, DeducedTemplateType was not canonicalized correctly, so that template template parameter declarations were retained uncanonicalized. After llvm#95202, they are correctly canonicalized, but this now leads to these TTPs being anonymous entities, where the mangling implementation correctly doesn't expect an anonymous declaration of this kind, leading to a crash. Fixes llvm#106182.
…t internal shell syntax (llvm#105902) This patch rewrites a test that uses command substitution `$()` and the `stat` command, which are not supported by lit's internal shell. Instead of using this syntax to perform the file size comparison done in this test, a Python script is used instead to perform the same operation. Fixes llvm#102384.
This preserves the semantis of fneg and matches what we do in LegalizeDAG. I kept the legal FSUB check to force unrolling for some targets that don't have FSUB but have XOR. On Aarch64, using xor broke some tests that expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg (extractvectorelt X)))))) pattern.
…und. If the value is used in Scalar several times, the first attempt to find its position in the node (if ReuseShuffleIndices and ReorderIndices not empty) may fail. In this case need to find another copy of the same value and try again. Fixes llvm#106626
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes should be a first-class driver option. Notes to this effect have been added in the appropriate places in both flang and clang. --------- Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
…vm#106600) `uitofp` and `sitofp` instructions use the default rounding mode which is defined as round-to-nearest.
The interceptor types are supposed to match size_t (and the non-Windows ssize_t) exactly, but on 32-bit Windows `size_t` uses `unsigned int` whereas `SIZE_T` is `unsigned long`. The current definition results in `uptr` not matching `uintptr_t` since we otherwise get typedef redefinition errors. Work around this by using a #define instead of a typedef when defining SIZE_T. It would probably be cleaner to stop using these uppercase types, but that is a rather invasive change and this one is the minimal change to allow uptr to match uintptr_t on Windows. To ensure this compiles on Windows, we also remove the interceptor.h defines of uptr (that do not always match __sanitizer::uptr) and rely on __sanitizer::uptr instead. The interceptor types most likely predate those other types so clean up the unnecessary definition while here. This also reverts commit 18e06e3 and commit bb27dd8. Reviewed By: mstorsjo, vitalybuka Pull Request: llvm#106311
When the shuffle masks are `PoisonMaskElem`, there is not need to check the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause the compiler to crash. Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) && "SK_ExtractSubvector index out of range"' failed.
llvm#106889) We can't assume closed world even in full LTO post-link stage. It is only true if we are building a "GPU executable". However, AMDGPU does support "dyamic library". I'm not aware of any approach to tell if it is relocatable link when we create the pass. For now let's revert the patch as it is currently breaking things. We can re-enable it once we can handle it correctly.
NEON has non-IEEE compliant denormal flushing and the compiler should check if it safe to vectorize instructions for NEON in non-fast math mode. Fixes llvm#106909
Implement cost computation for VPWidenCallRecipe. In some cases, targets use argument info to compute intrinsic costs. If all operands of the call are VPValues with an underlying IR value, use the IR values as arguments. PR: llvm#106731
This patch reduces the memory usage for import lists by employing memory-efficient data structures. With this patch, an import list for a given destination module is basically DenseSet<uint32_t> with each element indexing into the deduplication table containing tuples of: {SourceModule, GUID, Definition/Declaration} In one of our large applications, the peak memory usage goes down by 9.2% from 6.120GB to 5.555GB during the LTO indexing step. This patch addresses several sources of space inefficiency associated with std::unordered_map: - std::unordered_map<GUID, ImportKind> takes up 16 bytes because of padding even though ImportKind only carries one bit of information. - std::unordered_map uses pointers to elements, both in the hash table proper and for collision chains. - We allocate an instance of std::unordered_map for each {Destination Module, Source Module} pair for which we have at least one import. Most import lists have less than 10 imports, so the metadata like the size of std::unordered_map and the pointer to the hash table costs a lot relative to the actual contents.
We have special handling for this in type legalization, but we didn't have a test.
Branches exiting the loop will remain regardless, so don't consider them in collectValuesToIgnore. This fixes another divergence between legacy and VPlan-based cost model. Fixes llvm#106780.
This is for an upcoming change to the threshold on Apple targets for using a constant pool for FP literals versus building them with integer moves. This file is based on literal_pools_float.ll. I tried to bolt on to the existing test, but it got messy as that file is already testing a matrix of combinations, so creating this new file instead.
…finx. We should use RMM instead of DYN.
…stead of using isel patterns.
…=` (llvm#106803) Time trace profiler support was added into LLVMgold in cd3255a. This patch adds its `-plugin-opt` counterpart, which is just an alias to `--time-trace=`, into LLD for compatibility.
Widening/narrowing the source data type to match the destination data type may require multiple steps. To model the costs, the patch generated the interim type by following the logic in RISCVTargetLowering::lowerVPFPIntConvOp.
`HighMask` is the value that sets bits from `Msb+1` to 63 to 1, while the other bits are set to 0.
Currently the option prints a path to a nonexistent directory with the full triple, `lib/powerpc64-ibm-aix7.2.0.0`. It should only be `lib/aix`.
…d. NFC (llvm#106671) This predicate isn't bound to the scheduler model and and we may want to reuse it in the future. We already moved it to reuse it in our downstream.
…m#106890) `FindInstantiatedDecl()` relies on the `CurContext` to find the corresponding class template instantiation for a class template declaration. Previously, we pushed the semantic declaration context for constraint comparison, which is incorrect for constraints on friend declarations. In issue llvm#78101, the semantic context of the friend is the TU, so we missed the implicit template specialization `Template<void, 4>` when looking for the instantiation of the primary template `Template` at the time of checking the member instantiation; instead, we mistakenly picked up the explicit specialization `Template<float, 5>`, hence the error. As a bonus, this also fixes a crash when diagnosing constraints. The DeclarationName is not necessarily an identifier, so it's incorrect to call `getName()` on e.g. overloaded operators. Since the DiagnosticBuilder has correctly handled Decl printing, we don't need to find the printable name ourselves. Fixes llvm#78101
Menooker
pushed a commit
that referenced
this pull request
Sep 3, 2024
…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame llvm#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame llvm#8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame llvm#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame llvm#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame llvm#8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```
Menooker
pushed a commit
that referenced
this pull request
Sep 3, 2024
`JITDylibSearchOrderResolver` local variable can be destroyed before completion of all callbacks. Capture it together with `Deps` in `OnEmitted` callback. Original error: ``` ==2035==ERROR: AddressSanitizer: stack-use-after-return on address 0x7bebfa155b70 at pc 0x7ff2a9a88b4a bp 0x7bec08d51980 sp 0x7bec08d51978 READ of size 8 at 0x7bebfa155b70 thread T87 (tf_xla-cpu-llvm) #0 0x7ff2a9a88b49 in operator() llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:58 #1 0x7ff2a9a88b49 in __invoke<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:149:25 #2 0x7ff2a9a88b49 in __call<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:224:5 #3 0x7ff2a9a88b49 in operator() libcxx/include/__functional/function.h:210:12 #4 0x7ff2a9a88b49 in void std::__u::__function::__policy_invoker<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
buffer-results-to-out-params pass will have a nullptr error when hoist-static-allocs option is on, when the return value of a function is a parameter of the function. This PR fixes this issue and let the pass remove the return value in the ReturnOp when