[Enhancement] Optimize code in arm #55072
Open
+41
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
arm is slower than x86 in some cases
What I'm doing:
before this pr, rf insert_hash/ streamvbyte's decode/int128_mul_overflow is at least 3 times slower than x86, after this pr and clang17, arm is faster or equal to x86 in these cases
for example:
select count(t.a) from (select cast(id_decimal as float) as a from test_all_type_select) as t
x86: 0.11s
arm:0.36s
arm-opt(with this pr): 0.11s
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: