Skip to content

Conversation

@parthchandra
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

While running an Iceberg benchmark on a graviton cluster, @hsiang-c found the following calls taking ~12% of the execution time

_dl_tlsdesc_dynamic
__aarch64_cas8_acq_rel
__aarch64_swp8_acq_rel
__aarch64_ldadd8_rel

The __aarch64_cas8_acq_rel, __aarch64_swp8_acq_rel, __aarch64_ldadd8_rel calls can be optimized on graviton V2 and newer ARM processors into a single instruction.
The _dl_tlsdesc_dynamic call can be optimized to reduce the overhead of thread local storage resolution by the using a static linking model (https://maskray.me/blog/2021-02-14-all-about-thread-local-storage)

What changes are included in this PR?

Adds a separate build for a graviton binary. Loading of the native binary checks for graviton processors and then loads the graviton version if necessary.

This can be overridden by specifying the environment variable - NATIVE_VARIANT_ENV

The build for graviton uses nightly in the build container. This is for the -Z tls-model=initial-exec flag which is only available in nightly.

The build also adds explicit dependencies on serde and serde_urlencoded which are depended on transitively by reqwest and are somehow not resolved while building on graviton.

The build adds to the size of the uber jar

-rwxr-xr-x@ 1 parth  staff   90455632 Feb  2 10:32 ./common/target/classes/org/apache/comet/linux/aarch64/libcomet.so
-rwxr-xr-x@ 1 parth  staff   98510816 Feb  2 10:20 ./common/target/classes/org/apache/comet/linux/amd64/libcomet.so
-rwxr-xr-x@ 1 parth  staff   95560232 Feb  2 10:43 ./common/target/classes/org/apache/comet/linux/graviton/libcomet.so

The build-comet-release.sh script is also updated and makes it a little easier to specify which platforms to build the binary for. Building for graviton is optional

How are these changes tested?

Manually by running the build-release-comet.sh script. Script and binary independently verified by @hsiang-c

Contributor note: @hsiang-c contributed to this PR. Also Claude Code contributed.

Also modify the dependencies. For graviton reqwest does not pull in serde transitively so specify it explicitly
@parthchandra parthchandra requested review from andygrove and mbutrovich and removed request for andygrove February 2, 2026 19:29
# Requires Rust nightly for -Z tls-model flag
# Uses initial-exec TLS model, LSE, and native CPU optimizations for Graviton
core-graviton-libs:
cd native && RUSTFLAGS="-Z tls-model=initial-exec -C target-feature=+lse -C target-cpu=native" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -Z tls-model=initial-exec flag requires nightly. The gain from this change only came to about 1% so we can revert this if we don't want to use nightly

@parthchandra parthchandra changed the title graviton specific build build: graviton specific build Feb 2, 2026
@mbutrovich
Copy link
Contributor

I propose just changing the target-cpu on ARM builds to neoverse-n1 which gets you LSE (atomics) and is a reasonable baseline for ARM architectures available in 2026 from cloud vendors. It's also a subset of what Apple Silicon provides. Newer neoverse targets would be great, but I don't know if we want to require new Graviton and eliminate stuff like Ampere Altra. I'd need to do a survey of GCP and Azure ARM cores.

@parthchandra
Copy link
Contributor Author

I propose just changing the target-cpu on ARM builds to neoverse-n1 which gets you LSE (atomics) and is a reasonable baseline for ARM architectures available in 2026 from cloud vendors. It's also a subset of what Apple Silicon provides. Newer neoverse targets would be great, but I don't know if we want to require new Graviton and eliminate stuff like Ampere Altra. I'd need to do a survey of GCP and Azure ARM cores.

One reason for doing it this way was that building for LSE and running on an ARM version that doesn't support LSE causes an instant crash. I don't have any real world data on how many people may be on older versions of ARM but didn't want to take the chance.

@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 12.50000% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.89%. Comparing base (f09f8af) to head (525a556).
⚠️ Report is 918 commits behind head on main.

Files with missing lines Patch % Lines
...mon/src/main/java/org/apache/comet/NativeBase.java 12.50% 24 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3366      +/-   ##
============================================
+ Coverage     56.12%   59.89%   +3.77%     
- Complexity      976     1465     +489     
============================================
  Files           119      175      +56     
  Lines         11743    16196    +4453     
  Branches       2251     2692     +441     
============================================
+ Hits           6591     9701    +3110     
- Misses         4012     5138    +1126     
- Partials       1140     1357     +217     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mbutrovich
Copy link
Contributor

I propose just changing the target-cpu on ARM builds to neoverse-n1 which gets you LSE (atomics) and is a reasonable baseline for ARM architectures available in 2026 from cloud vendors. It's also a subset of what Apple Silicon provides. Newer neoverse targets would be great, but I don't know if we want to require new Graviton and eliminate stuff like Ampere Altra. I'd need to do a survey of GCP and Azure ARM cores.

One reason for doing it this way was that building for LSE and running on an ARM version that doesn't support LSE causes an instant crash. I don't have any real world data on how many people may be on older versions of ARM but didn't want to take the chance.

As best as I can tell, LSE was added in ARMv-8.1-A, which predates anything I think we'd find in modern data centers. Azure and GCP use something newer than neoverse-n1, and n1 seems like a reasonable baselines for current Graviton offerings. In the same way probably don't want to ship binaries for anything that doesn't have AVX2, I think it's reasonable to cut off ARM without LSE, at least for shipped binaries. Folks are free to compile their own if they need, in the same way that they'll get a faster build if they compile on their target architecture.

I summarized my thoughts in #3368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants