Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux aarch64 Support for Vale Pip Package #60

Merged

Conversation

geoffreynyaga
Copy link
Contributor

This PR introduces support for the Linux aarch64 (ARM 64-bit) architecture in the Vale pip package.

See Linked Issue: #40 (comment)

The following changes were made:

Changes:

  • Updated the get_target() function to correctly identify and handle aarch64 as a supported architecture.
  • Modified the download logic to fetch and install the correct Vale binary for Linux aarch64.

Steps to Reproduce the Issue:

#40 (comment)

This update ensures broader usability of the Vale pip package across diverse hardware platforms.

Copy link
Owner

@daniperez daniperez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @geoffreynyaga for this contribution! Lots of mac users will benefit from it! 👏

Other than a couple petty requests for the code itself, could you test this also on a Mac, native? You are detecting macs with platform.machine(), whereas before we used (for some reason) platform.processor(). Just as a sanity check. I checked myself the doc for the methods and, on paper, we should be fine.

vale/main.py Outdated
f"Architecture {os.uname().machine} not supported. "
"Supported architectures are 'x86_64' and 'arm'"
f"Architecture '{machine}' is not supported. "
"Supported architectures are 'x86_64', 'arm64', and 'i386'."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are trying to show to the user the families of architectures that are supported, but would it perhaps be more useful showing here also amd64, aarch64 and i686 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, and I appreciate the attention to detail. Running platform.processor() natively on a Mac does returns 'arm', while platform.machine() returns 'arm64'. So technically, the old code was correct for Macs (natively). However, when running in a VM environment on a Mac, both platform.processor() and platform.machine() return 'aarch64'.
This discrepancy was the primary driver for the change, as the platform detection logic needed to accommodate the VM behavior as well.

Mac Silicon Native

python3 
>> import platform
>> platform.processor()
'arm'

>> platform.machine()
'arm64'

Virtualisation on a Mac Silicon

python3 
>> import platform
>> platform.processor()
'aarch64'

>> platform.machine()
'aarch64'

I'll make sure to test the updated code natively on a Mac as a sanity check and confirm that everything aligns correctly. Thank you for pointing this out!

vale/main.py Outdated
f"Operating system '{sys.platform}' not supported. "
"Supported operating systems are 'linux', 'darwin', and 'win32'."
)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, but could I ask you to take the opportunity to move this code to the else branch in line 48? Just to be symmetrical to the architecture check. Thanks! 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SecondSkoll
Copy link

SecondSkoll commented Dec 9, 2024

Hi @daniperez, I was helping @geoffreynyaga to sanity check his discovery and fix of this issue and I wanted to provide my own suggestion for you to sanity check.

We found the error to be in the following section of code:

    if platform.processor() == "arm":
        architecture = "arm64"
    else:
        convert_arch = {"32bit": "32-bit", "64bit": "64-bit"}
        architecture = convert_arch.get(platform.architecture()[0], None)

It turns out platform.processor() can return a null string, or the .machine() result in some cases, which is the crux of this issue. For @geoffreynyaga it was returning aarch64.

A temporary fix for this specific case would be to just use if platform.processor() in {"arm", "aarch64"}: but this doesn't really help any other users with edge cases. It seems there are some unsolved issues with this function on Apple silicon - likely related to virtualisation from context.

I can't see any real pathway forward with the lack of certainty around the outputs of platform.*() commands (a lot of them just say that an empty string is returned if nothing is determined). It's possible that adopting a different identification mechanism like cpuinfo may address this. It seems that using cpuinfo.get_cpu_info()['arch'] and then checking for 'ARM_7', 'ARM_8', or 'X86_64' could be one solution - but it is a bit slower.

What are your thoughts?

@geoffreynyaga
Copy link
Contributor Author

@SecondSkoll
I have checked the cpuinfo commands both natively and on a VM.

Native

python3
Python 3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from cpuinfo import get_cpu_info
>>> get_cpu_info()
{'python_version': '3.13.0.final.0 (64 bit)', 'cpuinfo_version': [9, 0, 0], 'cpuinfo_version_string': '9.0.0', 'arch': 'ARM_8', 'bits': 64, 'count': 8, 'arch_string_raw': 'arm64', 'brand_raw': 'Apple M1 Pro'}
>>> get_cpu_info()['arch']
'ARM_8'
>>> get_cpu_info()['arch_string_raw']
'arm64'

On VM

python3                             
Python 3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cpuinfo import get_cpu_info
>>> get_cpu_info()
{'python_version': '3.10.12.final.0 (64 bit)', 'cpuinfo_version': [9, 0, 0], 'cpuinfo_version_string': '9.0.0', 'arch': 'ARM_8', 'bits': 64, 'count': 4, 'arch_string_raw': 'aarch64', 'flags': ['aes', 'asimd', 'asimddp', 'asimdfhm', 'asimdhp', 'asimdrdm', 'atomics', 'cpuid', 'crc32', 'dcpodp', 'dcpop', 'dit', 'evtstrm', 'fcma', 'flagm', 'flagm2', 'fp', 'fphp', 'frint', 'ilrcpc', 'jscvt', 'lrcpc', 'paca', 'pacg', 'pmull', 'sb', 'sha1', 'sha2', 'sha3', 'sha512', 'ssbs', 'uscat'], 'vendor_id_raw': 'ARM'}
>>> get_cpu_info()['arch']
'ARM_8'
>>> get_cpu_info()['arch_string_raw']
'aarch64'

@daniperez
Copy link
Owner

I can't see any real pathway forward with the lack of certainty around the outputs of platform.*() commands (a lot of them just say that an empty string is returned if nothing is determined). It's possible that adopting a different identification mechanism like cpuinfo may address this. It seems that using cpuinfo.get_cpu_info()['arch'] and then checking for 'ARM_7', 'ARM_8', or 'X86_64' could be one solution - but it is a bit slower.

What are your thoughts?

Hi @SecondSkoll ! Thanks for the contribution.

I feel a bit reluctant to add a dependency to an otherwise dependency-free application, but I'm willing to reconsider that. @SecondSkoll I assume the problem persists even with @geoffreynyaga 's suggested fix? Can you somehow reproduce the issue? It would be useful to see what platform.machine() and platform.processor() return in the failing cases (if they can be reproduced).

@daniperez
Copy link
Owner

Another, less than optimal solution, would be to add a way to allow choosing the package to download or choosing the package to use. Some environment variable or similar. It would be needed only the first time the application is run.

@SecondSkoll
Copy link

SecondSkoll commented Dec 10, 2024

I feel a bit reluctant to add a dependency to an otherwise dependency-free application, but I'm willing to reconsider that. @SecondSkoll I assume the problem persists even with @geoffreynyaga 's suggested fix? Can you somehow reproduce the issue? It would be useful to see what platform.machine() and platform.processor() return in the failing cases (if they can be reproduced).

Sadly I don't have the hardware to reproduce this issue myself, but after some searching I've seen a fair few mentions of these edge cases. Rosetta 2 pretty much always returns 'i386' from what I've seen, for example - which wouldn't be solved by this fix (unless you want to steer people with actual i386 systems to download the wrong binary - as they are 32 bit systems and there is no 32 bit Vale binary). For it to be called out in the platform docs is also indicative of the extent of the issues:

An empty string is returned if the value cannot be determined. Note that many platforms do not provide this information or simply return the same value as for machine(). NetBSD does this.

Stack overflow shows a fair few examples of people being confused by this over the years. The env variable may be the best solution. Nothing really seems perfect.

@daniperez
Copy link
Owner

@geoffreynyaga @SecondSkoll what do you think if we merge the PR as it stands today, given that, while not perfect, it's a marginal improvement for some users? I´d implement in another PR the mechanism to force this package to download a specific architecture.

Another solution, more future-proof and elegant, is along point (5) in #30 (comment) . That is, publish wheels for all the supported architectures, instead of a single package for all, and let the user choose with pip what to install.

@SecondSkoll
Copy link

SecondSkoll commented Dec 13, 2024

@geoffreynyaga @SecondSkoll what do you think if we merge the PR as it stands today, given that, while not perfect, it's a marginal improvement for some users? I´d implement in another PR the mechanism to force this package to download a specific architecture.

I have reservations around the current logic which would support 32bit architecture and those systems would get through all the error checks to find that there's no available binary - but I think it's a separate problem to the core of this issue.

Still, I don't think you should mention 'i386' and 'i686' as being supported - because they shouldn't be. The only reason it was registering the binary as working on a 32bit Ubuntu VM was the fact that it was a VM - and only some virtualisation environments can run 64bit binaries on 32bit VMs, native 32bit systems wont be able to.

Now, the chances of running into an actual 32bit machine in the wild is low, and getting lower (you'd struggle to find a system that isn't more than 20 years old), but it's probably still a good idea to provide an error for those systems. Note: The logic that allows the download of a 32bit binary existed in the code before this issue was raised - it's likely just been rarely encountered - if ever, and obviously not reported.

@geoffreynyaga
Copy link
Contributor Author

I've now added logic to raise error for 32-bit systems.
Additionally, I've tested the updated code on WSL, amd64, and aarch64 environments, all with success.
Let me know if you see any further gaps or have additional suggestions.

@mathstuf
Copy link

I can confirm that this works on Asahi Linux.

@SecondSkoll
Copy link

@daniperez This should be good to merge now - until that more future proof approach can be implemented. :)

@daniperez daniperez merged commit febdd62 into daniperez:main Jan 12, 2025
@daniperez
Copy link
Owner

Hi guys. Sorry for the delay coming back to this. 3.9.2.1 is the first version to include this patch. Let me know if you find anything weird.

Thanks @geoffreynyaga for pushing for this fix. This allows a significant amount of new people to use Vale, people who were excluded before.

Thanks also to the other reviewers for your scrutiny and your tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants