Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Apple M1 compatibility #16346

Merged
merged 20 commits into from
Apr 10, 2021
Merged

Apple M1 compatibility #16346

merged 20 commits into from
Apr 10, 2021

Conversation

cdrappi
Copy link
Contributor

@cdrappi cdrappi commented Apr 4, 2021

Problem

Solana does not build on computers with Mac M1 chips

Summary of Changes

  • Upgrade cc crate
  • Explicitly force ring to version 0.16.17: It's required by rustls which has some wiggle room on ring versions
  • Remove memory profiling with jemallocator, now defaulting to the system allocator
  • Exclude target_os = "macOS" from AVX check

Fixes #16072

Current state

Good: It builds, runs (with ./run.sh), and cargo +nightly bench works

Bad: cargo test and cargo test --release both fail different tests. After having run these many times each and getting a feel for their behavior, I've gotten similar but not exactly the same failures. I'd love to get some color on what you expect from these runs

cargo test
  • test_kill_heaviest_partition fails every time with: 'assertion failed: loop_start.elapsed() < loop_timeout', local-cluster/src/cluster_tests.rs:284:9
  • test_kill_partition_switch_threshold_no_progress fails sometimes with a 'get_slot for 6iXMbNNczChSGvo4EAThk8udgApPK9Sk8vcsc1w4Nq3 failed', local-cluster/src/cluster_tests.rs:332:37
  • test_future_tower_master_only has failed but more rarely
cargo test --release
  • test_push_votes_with_tower fails all the time, on either assert_eq!(vote_slots.len(), MAX_LOCKOUT_HISTORY); (with many different vote_slots lengths) or assert!(vote_slot != 23);. From knowing almost nothing about Solana, I'm most concerned with the last type of failure case here

@mvines mvines added CI Pull Request is ready to enter CI and removed CI Pull Request is ready to enter CI labels Apr 4, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Apr 4, 2021
@mvines
Copy link
Contributor

mvines commented Apr 4, 2021

Thanks for looking into this!

I think you could incorporate install/MacM1.md into the existing README instructions around https://github.com/solana-labs/solana#building

@mvines mvines added the CI Pull Request is ready to enter CI label Apr 4, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Apr 4, 2021
@cdrappi
Copy link
Contributor Author

cdrappi commented Apr 5, 2021

@mvines noted! This is getting closer and I updated the note above with a few questions. When you or another maintainer gets a chance, I'd love some feedback :)

@aeyakovenko
Copy link
Member

@sakridge do we still need the jemalloc at all? can we yank it?

@mvines
Copy link
Contributor

mvines commented Apr 5, 2021

@sakridge do we still need the jemalloc at all? can we yank it?

Ripping out the jemalloc stuff seems preferable to more featurization to me

@@ -1388,7 +1388,7 @@ fn report_target_features() {
}
);

#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), not(target_os = "macos")))]
Copy link
Contributor

@mvines mvines Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macos m1 claims to be target_arch = "x86" or target_arch = "x86_64"? that seems terrible

Copy link
Contributor

@sakridge sakridge Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is that just for the interpreter? There is the mode on m1 to execute x86 code, so it has to report that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this feature check affect the x86 macs which have AVX?

Copy link
Contributor Author

@cdrappi cdrappi Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get this going on an M1, you have to install all of your stuff with Rosetta 2, a tool that allows you to run x86_64 software. This amounts to using a version of rust/cargo etc. that targets x86_64 instead of aarch64. And so to get this running, I had to insert this extremely confusing feature flag

Without Rosetta, rust would target aarch64 – I tried to do this initially, and gave up after a day as there were several more blockers upstream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. Would you mind sneaking in a comment in the code then here about why we're excluding macos. Otherwise somebody with an intel mbp (everybody on the core team) is likely to remove this line later inadvertently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sakridge yes it would and is a clear downside of including this feature check. If you could think of a better way to get around this, I'm happy to implement it and be your Mac M1 guinea pig guy

@mvines yes of course! Was planning on this and appreciate the reminder

@sakridge
Copy link
Contributor

sakridge commented Apr 5, 2021

@sakridge do we still need the jemalloc at all? can we yank it?

Ripping out the jemalloc stuff seems preferable to more featurization to me

Can't say I use jemalloc. I would remove it. I think heaptrack is a better tool for finding memory use issues and that requires the system allocator.

@cdrappi
Copy link
Contributor Author

cdrappi commented Apr 5, 2021

Thank you @mvines and @sakridge! I will work on removing jemalloc and bug you when I think I have something :)

@mvines mvines added the CI Pull Request is ready to enter CI label Apr 6, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Apr 6, 2021
@cdrappi cdrappi changed the title WIP: Apple M1 compatibility Apple M1 compatibility Apr 8, 2021
@cdrappi
Copy link
Contributor Author

cdrappi commented Apr 8, 2021

Hey @mvines and @sakridge, I've ripped out jemalloc from solana-measure and all its uses. Lmk what you think about these failing tests and anything I may have missed!

@mvines mvines added the CI Pull Request is ready to enter CI label Apr 9, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Apr 9, 2021
@mvines
Copy link
Contributor

mvines commented Apr 9, 2021

Thanks @cdrappi, looks like programs/bpf/Cargo.lock needs an update as well (it's another lock file we have in the tree :-/)
Would you mind also rebasing your PR (git pull --rebase) to get rid of the merge commits in here.

The changes themselves all look great

@cdrappi
Copy link
Contributor Author

cdrappi commented Apr 9, 2021

Thank you @mvines we should be in business now!

@mvines mvines added the CI Pull Request is ready to enter CI label Apr 9, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Apr 9, 2021
@mvines
Copy link
Contributor

mvines commented Apr 9, 2021

cargo fmt please :)

@mvines mvines added the CI Pull Request is ready to enter CI label Apr 9, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Apr 9, 2021
@codecov
Copy link

codecov bot commented Apr 9, 2021

Codecov Report

Merging #16346 (98812e9) into master (b08cff9) will decrease coverage by 0.0%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           master   #16346     +/-   ##
=========================================
- Coverage    79.9%    79.9%   -0.1%     
=========================================
  Files         413      412      -1     
  Lines      110634   110584     -50     
=========================================
- Hits        88499    88449     -50     
  Misses      22135    22135             

@mvines mvines merged commit 54a04ba into solana-labs:master Apr 10, 2021
@mvines
Copy link
Contributor

mvines commented Apr 10, 2021

Thank you for this contribution @cdrappi!

@zarex5
Copy link

zarex5 commented Apr 10, 2021

Hi @cdrappi, thanks a lot for your PR. I was able to build from source under Rosetta 🎉
However, I get this message when starting solana-test-validator: "ERROR solana_core::validator] Your machine does not have AVX support, please rebuild from source on your machine" and then "Validator process aborted."
Are you able to run it on your side? Thanks in advance!

@cdrappi
Copy link
Contributor Author

cdrappi commented Apr 10, 2021

@zarex5 This is exactly the error I received before adding this twisted feature flag to the code. It was giving me this even though I was building it from my machine, like the situation you seem to be in. I'm surprised you're still getting this given the flag is in there now. This was helpful for me when I was tracking down what my computer was running (eg. making sure everything was using Rosetta):

Make a new bin, I called it arch-detector:
cargo new --bin arch-detector && cd arch-detector

Add a build.rs to the root with this content:

fn main() {
    println!(
        "cargo:rustc-env=TARGET={}",
        std::env::var("TARGET").unwrap()
    );
}

and edit your src/main.rs to look like this:

const TARGET: &str = env!("TARGET");

fn main() {
    println!("Target: {}", TARGET);
}

Then check the output of cargo run. Mine says:

Target: x86_64-apple-darwin

If you get this output, then you can be sure that feature flag will prevent check_avx from running. If not, my hunch is that some part of your install may not be using Rosetta. If this is still a problem, you and I can spitball a solution on Discord or something and then maybe add a little more detail to the M1 setup documentation

@cdrappi
Copy link
Contributor Author

cdrappi commented Apr 10, 2021

A ha! @zarex5 I think I know what you're talking about. If we install Solana via the install script, and then run solana-test-validator, it will show us the error you mention when we inspect the logs. Here's the output I get:

❯ solana-test-validator

Ledger location: test-ledger
Log: test-ledger/validator.log
⠉ Initializing...
Validator process aborted. The validator log may contain further details

The logs show me this:

[a bunch of stuff and then...]
ERROR solana_core::validator] Your machine does not have AVX support, please rebuild from source on your machine

This is because the version installed in that script does not include these changes yet. v1.6.5 and greater will have these, going forward. For now, we will have to run a local solana node using ./run.sh from the solana repo root. To my understanding, calling ./run.sh will do the same thing as solana-test-validator, excepting using the version you built from source, rather than the solana-cli installed from that script mentioned above (currently version 1.6.3)

Let me know if that clears up confusion you may have! I say this with like 80% confidence I know what I'm talking about so happy to troubleshoot with you. It would probably help us both out :)

@zarex5
Copy link

zarex5 commented Apr 11, 2021

Thats's exactly it, my x86 build was just fine but I was still using the version installed by the install script
Running ./run.sh works perfectly. Thank you so much for your help!

@cdrappi cdrappi deleted the apple-m1 branch April 11, 2021 16:59
sakridge added a commit to sakridge/solana that referenced this pull request Sep 12, 2021
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Sep 23, 2021
solana-labs#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM.

This commit sets jemalloc as the default allocator.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Sep 23, 2021
solana-labs#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM:
https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580

This commit sets jemalloc as the default allocator.
behzadnouri added a commit that referenced this pull request Sep 24, 2021
#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM:
https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580

This commit sets jemalloc as the default allocator.
mergify bot pushed a commit that referenced this pull request Sep 24, 2021
#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM:
https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580

This commit sets jemalloc as the default allocator.

(cherry picked from commit 2cf081d)

# Conflicts:
#	Cargo.lock
mergify bot added a commit that referenced this pull request Sep 24, 2021
…0149) (#20166)

* uses tikv_jemallocator::Jemalloc as the global allocator (#20149)

#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM:
https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580

This commit sets jemalloc as the default allocator.

(cherry picked from commit 2cf081d)

# Conflicts:
#	Cargo.lock

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Sep 24, 2021
…lana-labs#20149) (solana-labs#20166)

* uses tikv_jemallocator::Jemalloc as the global allocator (solana-labs#20149)

solana-labs#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM:
https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580

This commit sets jemalloc as the default allocator.

(cherry picked from commit 2cf081d)

# Conflicts:
#	Cargo.lock

* removes backport merge conflicts

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
dankelleher pushed a commit to identity-com/solana that referenced this pull request Nov 24, 2021
…#20149)

solana-labs#16346
switched default allocator from jemalloc to system allocator, but that
has shown regressions in form of higher ram usage causing nodes go OOM:
https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580

This commit sets jemalloc as the default allocator.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to build on Mac m1
6 participants