Skip to content

Fix UBSan "misaligned-pointer-use" warning on aarch64 #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Jan 14, 2024

As data type is a pointer to uint8_t, the access to the memory via the (uint64_t *)data pointer is misaligned. It can be an issue on ARM hardware. As suggested in https://en.cppreference.com/w/cpp/language/reinterpret_cast:

When it is needed to interpret the bytes of an object as a value of a different type, std::memcpy ... can be used.

For more details, please refer to bitcoin/bitcoin#29178.

Also: https://stackoverflow.com/questions/32062894/take-advantage-of-arm-unaligned-memory-access-while-writing-clean-c-code.

@hebasto
Copy link
Member Author

hebasto commented Jan 20, 2024

From bitcoin/bitcoin#29178 (comment):

Can you update bitcoin-core/crc32c-subtree#6 so that the commit includes a description of the problem, an explanation of the fix, and any further information. We should probably add a comment inline explaining why a memcpy is being added to that code.

Done.

cc @theuni

@luke-jr
Copy link
Member

luke-jr commented Jan 23, 2024

This doesn't make sense. 10 lines above, we already do the same thing - why isn't that an issue? Between then and this, data is only incremented by sizeof(uint64_t) which shouldn't break alignment...

Can't we just guarantee the data is always aligned somehow?

@fanquake
Copy link
Member

Compiling on an aarch64 Fedora machine, with gcc (GCC) 14.0.1 20240118 (Red Hat 14.0.1-0), master (bitcoin/bitcoin@e69796c) and this change (43180d7) produce exactly the same assembly for crc32c::ExtendArm64. ASM here: https://www.diffchecker.com/e9n7HIWP/.

So if there is an actual bug, this change just makes the UBSan (false positive?) warning "go away", and the bug would still be in the binary in any case?

@hebasto
Copy link
Member Author

hebasto commented Feb 6, 2024

As far as I understand, there are two possible cases depending on the actual compiler and the ARM hardware.

  1. The code works without UB. In that case, this PR makes the intention to access data via the misaligned pointer explicit and silences an UBSan warning.

  2. The code does cause UB.

Assuming that it is not feasible to test every ARM hardware and compiler combination, this change is an improvement.

@theuni
Copy link

theuni commented Feb 8, 2024

This doesn't make sense. 10 lines above, we already do the same thing - why isn't that an issue? Between then and this, data is only incremented by sizeof(uint64_t) which shouldn't break alignment...

Agree this is strange. I have no explanation.

Maybe an instruction that assumes alignment only gets emitted in the second loop? I don't have enough compiler-fu to even hazard a guess as to the difference.

@theuni
Copy link

theuni commented Feb 8, 2024

This doesn't make sense. 10 lines above, we already do the same thing - why isn't that an issue? Between then and this, data is only incremented by sizeof(uint64_t) which shouldn't break alignment...

Agree this is strange. I have no explanation.

Oh 🤦

This could be the case if length < KBYTES (size < 1024) :)

@theuni
Copy link

theuni commented Feb 9, 2024

@hebasto Would it be possible for you to test the above theory by reproducing the UBSan warning with either:

  • A gdb breakpoint that allows you to probe for size
  • An assertion at the top of the function that size < 1024
  • A printf of the size at the top of the function

?

@hebasto
Copy link
Member Author

hebasto commented Feb 9, 2024

@theuni

Would it be possible for you to test the above theory by reproducing the UBSan warning with either...

I have no access to the aarch64 machine at the moment. I'll try to regain it as soon as possible.

@theuni
Copy link

theuni commented Feb 13, 2024

This doesn't make sense. 10 lines above, we already do the same thing - why isn't that an issue? Between then and this, data is only incremented by sizeof(uint64_t) which shouldn't break alignment...

Agree this is strange. I have no explanation.

Oh 🤦

This could be the case if length < KBYTES (size < 1024) :)

Ok, I can confirm this is the case.

I managed to reproduce on macos, and added some assertions and printfs.

While fuzzing with utxo_snapshot and utxo_total_supply, length never grows past (by my eye) 171. And an assertion that length < KBYTES never fails.

So I think @hebasto and @luke-jr are both correct here. The cast is not safe and should be a memcpy. But that's the case for the __crc32cd call 10 lines up as well, we just happen to not hit it while fuzzing.

I can confirm that we also fall into the if(length & 4) and if(length & 2) cases, though ubsan doesn't yell about those. Maybe because they're always 4byte aligned? But regardless, I think those should get the memcpy treatment as well.

Ping @hebasto . Mind extending the memcpy treatment to the other cases as well just to be safe?

@dergoegge ping for visibility

@hebasto
Copy link
Member Author

hebasto commented Feb 14, 2024

Mind extending the memcpy treatment to the other cases as well just to be safe?

Thanks! Done.

@dergoegge
Copy link
Member

What about these:

if (length & 4) {
crc = __crc32cw(crc, *(uint32_t *)data);
data += 4;
}
if (length & 2) {
crc = __crc32ch(crc, *(uint16_t *)data);
data += 2;
}

I think I agree with cory that it'd be best to cover those too.

@theuni
Copy link

theuni commented Feb 15, 2024

I discussed this with @madars yesterday, who suggested using -Wcast-align=strict to check for this. Indeed it blows up here:

crc32c/src/crc32c_arm64.cc:91:28: warning: cast from ‘const uint8_t*’ {aka ‘const unsigned char*’} to ‘uint64_t*’ {aka ‘long unsigned int*’} increases required alignment of target type [-Wcast-align]
   91 |     crc = __crc32cd(crc3, *(uint64_t *)data);
      |                            ^~~~~~~~~~~~~~~~
crc32c/src/crc32c_arm64.cc:101:27: warning: cast from ‘const uint8_t*’ {aka ‘const unsigned char*’} to ‘uint64_t*’ {aka ‘long unsigned int*’} increases required alignment of target type [-Wcast-align]
  101 |     crc = __crc32cd(crc, *(uint64_t *)data);
      |                           ^~~~~~~~~~~~~~~~
crc32c/src/crc32c_arm64.cc:107:27: warning: cast from ‘const uint8_t*’ {aka ‘const unsigned char*’} to ‘uint32_t*’ {aka ‘unsigned int*’} increases required alignment of target type [-Wcast-align]
  107 |     crc = __crc32cw(crc, *(uint32_t *)data);
      |                           ^~~~~~~~~~~~~~~~
crc32c/src/crc32c_arm64.cc:112:27: warning: cast from ‘const uint8_t*’ {aka ‘const unsigned char*’} to ‘uint16_t*’ {aka ‘short unsigned int*’} increases required alignment of target type [-Wcast-align]
  112 |     crc = __crc32ch(crc, *(uint16_t *)data);

I think those warnings are legitimate.

It also complains about the macros which lead to CRC32C32BYTES, which at first glance also look unsafe.

So basically the entire accelerated arm64 path is at best iffy and at worst unsafe. I'm wondering if we should just disable it completely?

Edit: s/-wcast-align=strong/-Wcast-align=strict/

@theuni
Copy link

theuni commented Feb 15, 2024

It also complains about the macros which lead to CRC32C32BYTES, which at first glance also look unsafe.

Sure enough. I cloned the upstream crc32c repo and built/ran tests with sanitizers on my m1 macbook:

cmake -S . -B build -DCMAKE_CXX_FLAGS=-fsanitize="undefined" -DCRC32C_BUILD_BENCHMARKS=0 -DCMAKE_CXX_COMPILER="clang++"

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/cory/dev/crc32c/src/crc32c_arm64.cc:83:5 in
/Users/cory/dev/crc32c/src/crc32c_arm64.cc:83:5: runtime error: load of misaligned address 0x00016fc00d39 for type 'const uint64_t' (aka 'const unsigned long long'), which requires 8 byte alignment
0x00016fc00d39: note: pointer points here
 29 f1 bf  93 6d 4d 33 1f 11 09 07  0b 15 25 3b 57 79 a1 cf  03 3d 7d c3 0f 61 b9 17  7b e5 55 cb 47
              ^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/cory/dev/crc32c/src/crc32c_arm64.cc:83:5 in
/Users/cory/dev/crc32c/src/crc32c_arm64.cc:89:27: runtime error: load of misaligned address 0x00016fc01041 for type 'uint64_t' (aka 'unsigned long long'), which requires 8 byte alignment
0x00016fc01041: note: pointer points here
 11 09 07  0b 15 25 3b 57 79 a1 cf  03 3d 7d c3 0f 61 b9 17  7b e5 55 cb 47 c9 51 df  73 0d ad 53 ff

Yikes.

@hebasto
Copy link
Member Author

hebasto commented Feb 16, 2024

Addressed all cases pointed out when using -Wcast-align=strict.

@theuni
Copy link

theuni commented Feb 16, 2024

Nice. With this patch hacked into the upstream benchmarks, I don't see any significant difference before/after:

cmake -S . -B build -DCRC32C_BUILD_BENCHMARKS=1 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER="clang++"
before:

Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
CRC32CBenchmark/Public/256               40.1 ns         40.1 ns     17462325 bytes_per_second=5.9465G/s
CRC32CBenchmark/Public/4096               309 ns          309 ns      2261705 bytes_per_second=12.3261G/s
CRC32CBenchmark/Public/65536             3474 ns         3474 ns       201535 bytes_per_second=17.57G/s
CRC32CBenchmark/Public/1048576          54909 ns        54907 ns        12712 bytes_per_second=17.7856G/s
CRC32CBenchmark/Public/16777216        884746 ns       884746 ns          783 bytes_per_second=17.6604G/s
CRC32CBenchmark/Portable/256              106 ns          106 ns      6562849 bytes_per_second=2.24057G/s
CRC32CBenchmark/Portable/4096            1169 ns         1169 ns       598066 bytes_per_second=3.26452G/s
CRC32CBenchmark/Portable/65536          18260 ns        18260 ns        38309 bytes_per_second=3.34247G/s
CRC32CBenchmark/Portable/1048576       293165 ns       293165 ns         2388 bytes_per_second=3.3311G/s
CRC32CBenchmark/Portable/16777216     4709515 ns      4709477 ns          149 bytes_per_second=3.31778G/s
CRC32CBenchmark/ArmCRC32C/256            40.1 ns         40.1 ns     17461541 bytes_per_second=5.94759G/s
CRC32CBenchmark/ArmCRC32C/4096            309 ns          309 ns      2263021 bytes_per_second=12.3365G/s
CRC32CBenchmark/ArmCRC32C/65536          3473 ns         3473 ns       201501 bytes_per_second=17.5724G/s
CRC32CBenchmark/ArmCRC32C/1048576       54904 ns        54904 ns        12691 bytes_per_second=17.7867G/s
CRC32CBenchmark/ArmCRC32C/16777216     890130 ns       890118 ns          785 bytes_per_second=17.5538G/s

after:

Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
CRC32CBenchmark/Public/256               40.1 ns         40.1 ns     17461105 bytes_per_second=5.94675G/s
CRC32CBenchmark/Public/4096               309 ns          309 ns      2260376 bytes_per_second=12.3257G/s
CRC32CBenchmark/Public/65536             3474 ns         3474 ns       201541 bytes_per_second=17.5681G/s
CRC32CBenchmark/Public/1048576          54887 ns        54887 ns        12730 bytes_per_second=17.7924G/s
CRC32CBenchmark/Public/16777216        891310 ns       891311 ns          785 bytes_per_second=17.5304G/s
CRC32CBenchmark/Portable/256              106 ns          106 ns      6568268 bytes_per_second=2.24044G/s
CRC32CBenchmark/Portable/4096            1169 ns         1169 ns       598066 bytes_per_second=3.26383G/s
CRC32CBenchmark/Portable/65536          18263 ns        18263 ns        38298 bytes_per_second=3.34197G/s
CRC32CBenchmark/Portable/1048576       293169 ns       293155 ns         2386 bytes_per_second=3.33122G/s
CRC32CBenchmark/Portable/16777216     4710921 ns      4710926 ns          149 bytes_per_second=3.31676G/s
CRC32CBenchmark/ArmCRC32C/256            40.1 ns         40.1 ns     17462412 bytes_per_second=5.94629G/s
CRC32CBenchmark/ArmCRC32C/4096            309 ns          309 ns      2262941 bytes_per_second=12.3297G/s
CRC32CBenchmark/ArmCRC32C/65536          3474 ns         3474 ns       201600 bytes_per_second=17.5712G/s
CRC32CBenchmark/ArmCRC32C/1048576       54918 ns        54917 ns        12715 bytes_per_second=17.7824G/s
CRC32CBenchmark/ArmCRC32C/16777216     889966 ns       889953 ns          785 bytes_per_second=17.5571G/s

However the sanitizer warnings are now gone.

Copy link

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several different ways to do this, but I'm fine with this as-is because it demonstrates no obvious slowdown in the benchmarks.

ACK modulo optional nits.

@hebasto
Copy link
Member Author

hebasto commented Feb 16, 2024

Addressed @theuni's nits.

@hebasto hebasto force-pushed the 240114-aarch64-ub branch 2 times, most recently from 70a8d49 to 68ab55a Compare February 16, 2024 19:06
As `data` type is a pointer to `uint8_t`, the access to the memory via
the `(uint64_t *)data` pointer is misaligned. It can be an issue on ARM
hardware. As suggested in https://en.cppreference.com/w/cpp/language/reinterpret_cast:
"When it is needed to interpret the bytes of an object as a value of a
different type, std::memcpy ... can be used".

For more details, please refer to bitcoin/bitcoin#29178

Also: https://stackoverflow.com/questions/32062894/take-advantage-of-arm-unaligned-memory-access-while-writing-clean-c-code
Copy link

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1ac401e.

Thanks for sticking with this!
And thanks again to @madars for the warning recommendation.

@maflcko
Copy link

maflcko commented Feb 21, 2024

Can you submit the same upstream? It may not be maintained anymore, but if it is, they may want to pick this as well?

@hebasto
Copy link
Member Author

hebasto commented Feb 21, 2024

Can you submit the same upstream? It may not be maintained anymore, but if it is, they may want to pick this as well?

google#63

@theuni
Copy link

theuni commented Feb 27, 2024

@luke-jr @dergoegge @maflcko Mind having a quick look at this to confirm correctness?

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1ac401e

Code change looks correct. I've also done some differential fuzzing of the x86 implementation against both pre- and post-patch arm64 version.

@fanquake fanquake merged commit b60d2b7 into bitcoin-core:bitcoin-fork Feb 27, 2024
fanquake added a commit to fanquake/bitcoin that referenced this pull request Feb 27, 2024
b60d2b7334 Merge bitcoin-core/crc32c-subtree#6: Fix UBSan "misaligned-pointer-use" warning on aarch64
1ac401e32b Fix UBSan "misaligned-pointer-use" warning on aarch64

git-subtree-dir: src/crc32c
git-subtree-split: b60d2b733406cc64025095c6c2cb3933e222b529
@hebasto hebasto deleted the 240114-aarch64-ub branch February 27, 2024 18:29
fanquake added a commit to bitcoin/bitcoin that referenced this pull request Feb 28, 2024
5d45552 Squashed 'src/crc32c/' changes from 0bac72c455..b60d2b7334 (fanquake)

Pull request description:

  Update the crc32c subtree. Includes:
  * bitcoin-core/crc32c-subtree#6

  Which fixes #29178.

ACKs for top commit:
  hebasto:
    ACK 359a8d9.
  theuni:
    ACK 359a8d9
  dergoegge:
    ACK 359a8d9

Tree-SHA512: 2cec81a34ad26bbbc298aea5daffa41e56114d31cc2eb5fe486f46a77c3467bba22bdeca1c52ae97220e119d98818304272fc6337442af55282accabcd4c5833
luke-jr added a commit to bitcoinknots/bitcoin that referenced this pull request Mar 5, 2024
b60d2b73340 Merge bitcoin-core/crc32c-subtree#6: Fix UBSan "misaligned-pointer-use" warning on aarch64
1ac401e32bd Fix UBSan "misaligned-pointer-use" warning on aarch64

git-subtree-dir: src/crc32c
git-subtree-split: b60d2b733406cc64025095c6c2cb3933e222b529
kevkevinpal pushed a commit to kevkevinpal/bitcoin that referenced this pull request Mar 13, 2024
b60d2b7334 Merge bitcoin-core/crc32c-subtree#6: Fix UBSan "misaligned-pointer-use" warning on aarch64
1ac401e32b Fix UBSan "misaligned-pointer-use" warning on aarch64

git-subtree-dir: src/crc32c
git-subtree-split: b60d2b733406cc64025095c6c2cb3933e222b529
kevkevinpal pushed a commit to kevkevinpal/bitcoin that referenced this pull request Mar 13, 2024
b60d2b7334 Merge bitcoin-core/crc32c-subtree#6: Fix UBSan "misaligned-pointer-use" warning on aarch64
1ac401e32b Fix UBSan "misaligned-pointer-use" warning on aarch64

git-subtree-dir: src/crc32c
git-subtree-split: b60d2b733406cc64025095c6c2cb3933e222b529
janus pushed a commit to BitgesellOfficial/bitgesell that referenced this pull request Apr 6, 2024
b60d2b7334 Merge bitcoin-core/crc32c-subtree#6: Fix UBSan "misaligned-pointer-use" warning on aarch64
1ac401e32b Fix UBSan "misaligned-pointer-use" warning on aarch64

git-subtree-dir: src/crc32c
git-subtree-split: b60d2b733406cc64025095c6c2cb3933e222b529
@hebasto hebasto mentioned this pull request Sep 7, 2024
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 24, 2024
5d45552 Squashed 'src/crc32c/' changes from 0bac72c455..b60d2b7334 (fanquake)

Pull request description:

  Update the crc32c subtree. Includes:
  * bitcoin-core/crc32c-subtree#6

  Which fixes bitcoin#29178.

ACKs for top commit:
  hebasto:
    ACK 359a8d9.
  theuni:
    ACK 359a8d9
  dergoegge:
    ACK 359a8d9

Tree-SHA512: 2cec81a34ad26bbbc298aea5daffa41e56114d31cc2eb5fe486f46a77c3467bba22bdeca1c52ae97220e119d98818304272fc6337442af55282accabcd4c5833
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 24, 2024
5d45552 Squashed 'src/crc32c/' changes from 0bac72c455..b60d2b7334 (fanquake)

Pull request description:

  Update the crc32c subtree. Includes:
  * bitcoin-core/crc32c-subtree#6

  Which fixes bitcoin#29178.

ACKs for top commit:
  hebasto:
    ACK 359a8d9.
  theuni:
    ACK 359a8d9
  dergoegge:
    ACK 359a8d9

Tree-SHA512: 2cec81a34ad26bbbc298aea5daffa41e56114d31cc2eb5fe486f46a77c3467bba22bdeca1c52ae97220e119d98818304272fc6337442af55282accabcd4c5833
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants