Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for big-endian targets #1555

Closed
briansmith opened this issue Nov 9, 2022 · 13 comments
Closed

Add support for big-endian targets #1555

briansmith opened this issue Nov 9, 2022 · 13 comments
Milestone

Comments

@briansmith
Copy link
Owner

  1. Solve C (or Rust) fallback for all functions that work for all little-endian targets #1455 to get everything that depends on bn_mul_mont working when bn_mul_mont isn't available.
  2. Adjust build.rs and the #[cfg(target_arch)] logic throughout the project so that the assembly language implementations are never used on big-endian targets (specifically, big-endian 32-bit ARM and AArch64 need to fall back).
  3. Remove the unnecessary tests that we're building on a little-endian target.
  4. Add at least one big-endian target to CI that actually runs tests. This can be done in QEMU like the AAarch64 and 32-bit ARM tests are run.
  5. Improve the code test coverage to be closer to 100% and do an audit so that we can be sure there is no endian-specific logic.
@briansmith
Copy link
Owner Author

Regarding the CI changes, here are the specific changes I am looking for:

  • If we want to support any 64-bit big-endian target, then we must pick a single 64-bit big-endian Linux target X and change the build system to fully support it:
  • modify mk/install-build-tools.sh, mk/cargo.sh, and mk/check-symbol-prefixes.sh so that mk/cargo.sh test --target=X runs the tests for a 64-bit big-endian target. This way, I (and everybody) will be able to run the test suite locally to ensure we don't break the 64-bit big-endian code.
  • modify .github/workflows/ci.yml so that that the test jobs for that target are run in CI, and so that the coverage jobs are run for that target in CI.

Similarly, if we want to add support for any 32-bit big-endian target, then we need to pick a single 32-bit big-endian Linux target Y and make the analogous changes to the build system.

For that initial 64-bit target (and 32-bit target), we should use clang as the compiler, if at all practical; see the scripts in mk/ to see how we have optimized the build system to use clang for cross-compiling already.

The above implies we should choose a target that qemu and clang support.

At least when we add the first big-endian target, we should avoid any architecture-specific or operating-system-specific logic in those changes. Please see the process I suggest in #1455. If we can completely avoid any architecture-specific or operating-system-specific logic in the changes that add big-endian support, then adding a single 64-bit and a single 32-bit big-endian target to the GitHub Actions configuration will provide good test coverage. It's not realistic to expect we'll be able to add all of the big-endian targets we wish to support to CI due to GitHub Actions limitations, as we already have a huge build matrix.

When we decide which initial target(s) to add support for, we should focus on targets that have profiler_builtins support; see rust-lang/rust#79556. Otherwise the code coverage reporting won't work for the big-endian targets. Measuring the code coverage is really important, especially for people maintaining ring who don't normally deal with big-endian targets.

I think that the big-endian variants of the ARM64 and 32-bit ARM targets are particularly attractive to add to CI because then we'll have proper test coverage to ensure we aren't accidentally trying to wrongly use the little-endian-specific assembly language implementations that we have for ARM64 and 32-bit ARM. Indeed, we probably need to add cfg(target_endian) logic to our dispatching logic that currently only looks at target_arch, IIUC.

@briansmith
Copy link
Owner Author

As I mentioned in #1455 (comment), I am working on expanding the test suite so that it will be more likely to catch any issues that arise in adding the big-endian support.

Changes to BoringSSL in the last several months also look like they'll make supporting big-endian targets easier with less divergence from BoringSSL upstream. I'm presently working on merging all of the changes from BoringSSL into ring so that (I hope) fewer big-endian-specific changes to ring will be needed.

@uweigand
Copy link
Contributor

Hi @briansmith I've added cross-building and -testing support to #1297.

The PR now contains three commits

  • Add non-assembler fallbacks for all functions (in particular providing bn_mul_mont).
  • Generic (platform-independent) support for big-endian platforms
  • Basic enablement for s390x, supporting native build, cross build, and qemu-based cross-testing

With this patch, I can successfully complete cargo test --all-targets natively, and I can also successfully complete

./mk/cargo.sh test --all-targets --target=s390x-unknown-linux-gnu

on an x86_64 Ubuntu 22.04 host.

RING_COVERAGE does not work out of the box as rust is not built with profile library support on s390x (I don't think there is currently any platform except for Intel and ARM that supports this). I can look into what would be necessary to enable this on our platform.

@briansmith
Copy link
Owner Author

With this patch, I can successfully complete cargo test --all-targets natively, and I can also successfully complete

./mk/cargo.sh test --all-targets --target=s390x-unknown-linux-gnu

on an x86_64 Ubuntu 22.04 host.

That's awesome. I'm also investigating another approach to the bn_mul_mont issue since we actually already have a Rust version of that function, as suggested by @bltavares long ago in PR #1278.

RING_COVERAGE does not work out of the box as rust is not built with profile library support on s390x (I don't think there is currently any platform except for Intel and ARM that supports this). I can look into what would be necessary to enable this on our platform.

I think that would be awesome and well worth the effort, because that would enable lots of projects to be a lot more comfortable accepting patches to support s390x (and big endian in general) code paths. Once we figure out how to do it for one such platform, I bet it will be easy to do it for other similar targets like the PowerPC ones. See rust-lang/rust#60476 and rust-lang/rust#76035 for some examples of PRs for other targets, and see the related issue rust-lang/rust#79556.

@uweigand
Copy link
Contributor

That's awesome. I'm also investigating another approach to the bn_mul_mont issue since we actually already have a Rust version of that function, as suggested by @bltavares long ago in PR #1278.

Hmm, I thought there was still the problem that the P-384 code requires a C (or asm) implementation of bn_mul_mont in many places?

I think that would be awesome and well worth the effort, because that would enable lots of projects to be a lot more comfortable accepting patches to support s390x (and big endian in general) code paths. Once we figure out how to do it for one such platform, I bet it will be easy to do it for other similar targets like the PowerPC ones. See rust-lang/rust#60476 and rust-lang/rust#76035 for some examples of PRs for other targets, and see the related issue rust-lang/rust#79556.

I've now submitted rust-lang/rust#104304 -- let's see how this goes. I've verified using a local build of the rust compiler that with this fix, that I can complete a run of

RING_COVERAGE=1 ./mk/cargo.sh test --all-targets --target=s390x-unknown-linux-gnu

and the resulting coverage files look at first glance reasonable.

@briansmith
Copy link
Owner Author

That's awesome. I'm also investigating another approach to the bn_mul_mont issue since we actually already have a Rust version of that function, as suggested by @bltavares long ago in PR #1278.

Hmm, I thought there was still the problem that the P-384 code requires a C (or asm) implementation of bn_mul_mont in many places?

That's true. I just ended up implementing bn_mul_mont in terms of that Rust code in PR #1558 to deal with that issue.

I've now submitted rust-lang/rust#104304 -- let's see how this goes. I've verified using a local build of the rust compiler that with this fix, that I can complete a run of

RING_COVERAGE=1 ./mk/cargo.sh test --all-targets --target=s390x-unknown-linux-gnu

and the resulting coverage files look at first glance reasonable.

That's awesome! Thanks for doing that.

@uweigand
Copy link
Contributor

That's true. I just ended up implementing bn_mul_mont in terms of that Rust code in PR #1558 to deal with that issue.

Ah, I see. Using that PR instead of the first commit in #1297 also works on s390x. Thanks!

@uweigand
Copy link
Contributor

rust-lang/rust#104304 has now been merged and the feature is available in the nightly build as of today. Using this build, I was able to successfully run

RING_COVERAGE=1 ./mk/cargo.sh test --all-targets --target=s390x-unknown-linux-gnu

against current mainline ring plus the latest patch set in #1297, on an x86_64 Ubuntu 22.04 host.

@briansmith
Copy link
Owner Author

@uweigand Thanks for all your help here. I suspect that the main missing thing now is to ensure that the assembly language implementations (for ARM/Aarch64) are never used when the endianness is big-endian.

@briansmith
Copy link
Owner Author

And, one more thing: We'll need to ensure that the changes to make code endian-neutral are not regressing performance in terms of performance impact for the existing little-endian targets. Probably we can do this just by doing a diff of the assembly output for the changed code to make sure the diff is (effectively) empty.

@uweigand
Copy link
Contributor

@uweigand Thanks for all your help here. I suspect that the main missing thing now is to ensure that the assembly language implementations (for ARM/Aarch64) are never used when the endianness is big-endian.

I believe this should already be the case; the big-endian ARM targets use a different target triple (aarch64_be-unknown-linux-gnu), so none of the existing "aarch64" target checks should match. I was unable to easily verify this, as the Ubuntu cross-compile logic doesn't appear to support the aarch64_be target yet.

@uweigand
Copy link
Contributor

uweigand commented Nov 23, 2022

And, one more thing: We'll need to ensure that the changes to make code endian-neutral are not regressing performance in terms of performance impact for the existing little-endian targets. Probably we can do this just by doing a diff of the assembly output for the changed code to make sure the diff is (effectively) empty.

My original implementation did not guarantee that there are no assembly changes on little-endian targets. While I believe the actual changes were marginal and would not have measurable performance impact, I agree with the goal of ensuring actual identical assembler code - just to make sure.

I've now reworked the big-endian support patch to actually ensure this. In fact, with the patch, code on little-endian targets is unchanged from current code even after preprocessing (and therefore are compiling on any target with any compiler flags). The new implementation still looks reasonable (w.r.t. amount of changes compared to upstream) to me - total patch size even went down a bit. Let me know what you think!

@briansmith
Copy link
Owner Author

This is done. 32-bit big-endian PoewrPC support and 64-bit big-endian s390x support is already merged. #1677 adds 64-bit big-endian PowerPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants