-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Conversation
Thanks for looking into this! I think you could incorporate |
@mvines noted! This is getting closer and I updated the note above with a few questions. When you or another maintainer gets a chance, I'd love some feedback :) |
@sakridge do we still need the jemalloc at all? can we yank it? |
Ripping out the jemalloc stuff seems preferable to more featurization to me |
core/src/validator.rs
Outdated
@@ -1388,7 +1388,7 @@ fn report_target_features() { | |||
} | |||
); | |||
|
|||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))] | |||
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), not(target_os = "macos")))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macos m1 claims to be target_arch = "x86"
or target_arch = "x86_64"
? that seems terrible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is that just for the interpreter? There is the mode on m1 to execute x86 code, so it has to report that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this feature check affect the x86 macs which have AVX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get this going on an M1, you have to install all of your stuff with Rosetta 2, a tool that allows you to run x86_64 software. This amounts to using a version of rust/cargo etc. that targets x86_64
instead of aarch64
. And so to get this running, I had to insert this extremely confusing feature flag
Without Rosetta, rust would target aarch64
– I tried to do this initially, and gave up after a day as there were several more blockers upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. Would you mind sneaking in a comment in the code then here about why we're excluding macos. Otherwise somebody with an intel mbp (everybody on the core team) is likely to remove this line later inadvertently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say I use jemalloc. I would remove it. I think |
Thanks @cdrappi, looks like The changes themselves all look great |
…fail 2 tests, one seems worse than the other
Thank you @mvines we should be in business now! |
|
Codecov Report
@@ Coverage Diff @@
## master #16346 +/- ##
=========================================
- Coverage 79.9% 79.9% -0.1%
=========================================
Files 413 412 -1
Lines 110634 110584 -50
=========================================
- Hits 88499 88449 -50
Misses 22135 22135 |
Thank you for this contribution @cdrappi! |
Hi @cdrappi, thanks a lot for your PR. I was able to build from source under Rosetta 🎉 |
@zarex5 This is exactly the error I received before adding this twisted feature flag to the code. It was giving me this even though I was building it from my machine, like the situation you seem to be in. I'm surprised you're still getting this given the flag is in there now. This was helpful for me when I was tracking down what my computer was running (eg. making sure everything was using Rosetta): Make a new bin, I called it arch-detector: Add a
and edit your
Then check the output of
If you get this output, then you can be sure that feature flag will prevent |
A ha! @zarex5 I think I know what you're talking about. If we install Solana via the install script, and then run
The logs show me this:
This is because the version installed in that script does not include these changes yet. Let me know if that clears up confusion you may have! I say this with like 80% confidence I know what I'm talking about so happy to troubleshoot with you. It would probably help us both out :) |
Thats's exactly it, my x86 build was just fine but I was still using the version installed by the install script |
This reverts commit 54a04ba.
solana-labs#16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM. This commit sets jemalloc as the default allocator.
solana-labs#16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM: https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580 This commit sets jemalloc as the default allocator.
#16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM: https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580 This commit sets jemalloc as the default allocator.
#16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM: https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580 This commit sets jemalloc as the default allocator. (cherry picked from commit 2cf081d) # Conflicts: # Cargo.lock
…0149) (#20166) * uses tikv_jemallocator::Jemalloc as the global allocator (#20149) #16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM: https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580 This commit sets jemalloc as the default allocator. (cherry picked from commit 2cf081d) # Conflicts: # Cargo.lock * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>
…lana-labs#20149) (solana-labs#20166) * uses tikv_jemallocator::Jemalloc as the global allocator (solana-labs#20149) solana-labs#16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM: https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580 This commit sets jemalloc as the default allocator. (cherry picked from commit 2cf081d) # Conflicts: # Cargo.lock * removes backport merge conflicts Co-authored-by: behzad nouri <behzadnouri@gmail.com>
…#20149) solana-labs#16346 switched default allocator from jemalloc to system allocator, but that has shown regressions in form of higher ram usage causing nodes go OOM: https://discord.com/channels/428295358100013066/439194979856809985/890413193858539580 This commit sets jemalloc as the default allocator.
Problem
Solana does not build on computers with Mac M1 chips
Summary of Changes
cc
cratering
to version0.16.17
: It's required by rustls which has some wiggle room on ring versionsjemallocator
, now defaulting to the system allocatortarget_os = "macOS"
from AVX checkFixes #16072
Current state
Good: It builds, runs (with
./run.sh
), andcargo +nightly bench
worksBad:
cargo test
andcargo test --release
both fail different tests. After having run these many times each and getting a feel for their behavior, I've gotten similar but not exactly the same failures. I'd love to get some color on what you expect from these runscargo test
test_kill_heaviest_partition
fails every time with:'assertion failed: loop_start.elapsed() < loop_timeout', local-cluster/src/cluster_tests.rs:284:9
test_kill_partition_switch_threshold_no_progress
fails sometimes with a'get_slot for 6iXMbNNczChSGvo4EAThk8udgApPK9Sk8vcsc1w4Nq3 failed', local-cluster/src/cluster_tests.rs:332:37
test_future_tower_master_only
has failed but more rarelycargo test --release
test_push_votes_with_tower
fails all the time, on eitherassert_eq!(vote_slots.len(), MAX_LOCKOUT_HISTORY);
(with many differentvote_slots
lengths) orassert!(vote_slot != 23);
. From knowing almost nothing about Solana, I'm most concerned with the last type of failure case here