Skip to content

[lldb][AArch64] Fix Apple M4 on Linux #135563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

laverdet
Copy link

This architecture implements SSVE but does not implement SVE.

More information is included in #121693

cc: @DavidSpickett

This architecture implements SSVE but does not implement SVE.
@laverdet laverdet requested a review from JDevlieghere as a code owner April 13, 2025 20:43
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the lldb label Apr 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 13, 2025

@llvm/pr-subscribers-lldb

Author: Marcel Laverdet (laverdet)

Changes

This architecture implements SSVE but does not implement SVE.

More information is included in #121693

cc: @DavidSpickett


Full diff: https://github.com/llvm/llvm-project/pull/135563.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp (+11-11)
diff --git a/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp b/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
index 884c7d4b9e359..f540a160c901a 100644
--- a/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
@@ -107,19 +107,19 @@ NativeRegisterContextLinux::CreateHostNativeRegisterContextLinux(
     if (NativeProcessLinux::PtraceWrapper(PTRACE_GETREGSET,
                                           native_thread.GetID(), &regset,
                                           &ioVec, sizeof(sve_header))
-            .Success()) {
+            .Success())
       opt_regsets.Set(RegisterInfoPOSIX_arm64::eRegsetMaskSVE);
 
-      // We may also have the Scalable Matrix Extension (SME) which adds a
-      // streaming SVE mode.
-      ioVec.iov_len = sizeof(sve_header);
-      regset = NT_ARM_SSVE;
-      if (NativeProcessLinux::PtraceWrapper(PTRACE_GETREGSET,
-                                            native_thread.GetID(), &regset,
-                                            &ioVec, sizeof(sve_header))
-              .Success())
-        opt_regsets.Set(RegisterInfoPOSIX_arm64::eRegsetMaskSSVE);
-    }
+    // We may also have the Scalable Matrix Extension (SME) which adds
+    // a streaming SVE mode. Note that SVE and SSVE may implemented
+    // independently, which is true on Apple's M4 architecture.
+    ioVec.iov_len = sizeof(sve_header);
+    regset = NT_ARM_SSVE;
+    if (NativeProcessLinux::PtraceWrapper(PTRACE_GETREGSET,
+                                          native_thread.GetID(), &regset,
+                                          &ioVec, sizeof(sve_header))
+            .Success())
+      opt_regsets.Set(RegisterInfoPOSIX_arm64::eRegsetMaskSSVE);
 
     sve::user_za_header za_header;
     ioVec.iov_base = &za_header;

@laverdet
Copy link
Author

Closing this since Docker on macOS simply disabled SME, SVE, etc. I believe this is still an issue in theory for users with Linux installed directly on M4 hardware but if such a user exists I haven't heard of them.

@laverdet laverdet closed this Apr 14, 2025
@jasonmolenda
Copy link
Collaborator

As the patch notes, Apple's M4 has the SME register & instructions, plus Streaming SVE Mode and the SVE register set, but most of the SVE instructions are not supported. And the SVE registers (z0-31, p0-15) are only available when the core is in Streaming SVE Mode I believe. I guess the main concern would be someone keying off of "this core has SVE registers" (true) and "this core can run SVE API tests" (most likely false).

But as far as the patch goes, it looks good to me. While Docker might not virtualize the SME, the Darwin kernel does support this and Linux running in a VM will have access to these hardware resources on an M4 system.

@laverdet laverdet reopened this Apr 14, 2025
@DavidSpickett
Copy link
Collaborator

I think I can test this on Arm's Foundation Model, I will do that and get back to you. I have not checked it before now.

@mstorsjo
Copy link
Member

Closing this since Docker on macOS simply disabled SME, SVE, etc. I believe this is still an issue in theory for users with Linux installed directly on M4 hardware but if such a user exists I haven't heard of them.

Wouldn't this still be an issue if running virtualized Linux via some other app than Docker, e.g. VMWare, UTM, Parallels etc, not requiring fully native Linux? (AFAIK upstream Asahi Linux doesn't yet support M3/M4.)

@mstorsjo
Copy link
Member

I think I can test this on Arm's Foundation Model, I will do that and get back to you.

FWIW, it would be super convenient if QEMU could be set up to emulate this precise configuration. You don't happen to have connections to someone who could be prodded into implementing it? :-)

@mstorsjo
Copy link
Member

Closing this since Docker on macOS simply disabled SME, SVE, etc. I believe this is still an issue in theory for users with Linux installed directly on M4 hardware but if such a user exists I haven't heard of them.

@jannau helped figure out a bit more on this; it's probably not Docker itself that took any action on the matter, but an updated kernel probably did. See https://lore.kernel.org/linux-arm-kernel/20250103142635.1759674-1-maz@kernel.org/ - which is backported down to 6.6 now. This patch makes sure that the kernel doesn't enable the SVE2 (and other SVE related features) unless the main SVE feature is enabled.

Therefore, this situation should only be an issue with older kernels, so perhaps it not something that regular user mode applications should need to worry about (unless specifically wanting to run with older kernels).

That doesn't explain why SME no longer is enabled though, but that may be due to https://lore.kernel.org/qemu-devel/20250315061801.622606-21-mjt@tls.msk.ru/.

(The remaining open question is whether Windows, virtualized on similar HW, has a similar condition for their SVE feature flags.)

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Apr 24, 2025

I have got Arm's Foundation Model to boot with SME only, these are the options if you are used to using shrinkwrap to run the model already:

+    -C cluster0.has_sve : 1
+    -C cluster1.has_sve : 1
+    -C cluster0.sve.has_sme2 : 0
+    -C cluster1.sve.has_sme2 : 0
+    -C cluster0.sve.has_sme : 1
+    -C cluster1.sve.has_sme : 1
+    -C cluster0.sve.has_sve2 : 1
+    -C cluster1.sve.has_sve2 : 1
+    -C cluster0.sve.sme_only : 1
+    -C cluster1.sve.sme_only : 1

(if you are not used to that, I'll write something up when I get some time)

The cpuinfo is:

Features        : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop asimddp asimdfhm dit uscat ilrcpc flagm sb paca pacg gcs dcpodp flagm2 frint i8mm bf16 dgh rng bti ecv afp sme smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 wfxt ebf16 cssc mops hbc poe

It reports sme but no sve features, which makes sense because I built the kernel from the latest commit. I thought I was missing something else to get sve2 as was seen in the linked docker issue.

Also, SME is disabled in kernel config for unrelated reasons, so I had to re-enable that:

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6..27437f131 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2285,7 +2285,6 @@ config ARM64_SME
        bool "ARM Scalable Matrix Extension support"
        default y
        depends on ARM64_SVE
-       depends on BROKEN
        help
          The Scalable Matrix Extension (SME) is an extension to the AArch64
          execution state which utilises a substantial subset of the SVE

I will find out what QEMU can do / plans to do.

Assuming I reproduce the failures this PR aims to fix, I'll write up an issue with how to reproduce it without an actual M4 and we can mark this as the fix for it.

@DavidSpickett
Copy link
Collaborator

Therefore, this situation should only be an issue with older kernels, so perhaps it not something that regular user mode applications should need to worry about (unless specifically wanting to run with older kernels).

I will test this PR with an older kernel as well. Would be nice if it works there too.

@DavidSpickett
Copy link
Collaborator

FWIW, it would be super convenient if QEMU could be set up to emulate this precise configuration. You don't happen to have connections to someone who could be prodded into implementing it? :-)

A least for Linaro, there are no plans to implement this. QEMU doesn't try to be a completely general model so I suspect until there is a common CPU that does this, or some system standard that requires it, it would not be a priority.

Someone else could try hacking it in and see if it's feasible to contribute such a mode. I have no estimate how much work it would be.

In the meantime there's Arm's Foundation Model, though it only does whole system emulation.

@DavidSpickett
Copy link
Collaborator

Sorry it's taken me ages to get to but I have finally tested this on Arm's Foundation Model and raised #138717 to document that.

Initial impression is that this stops lldb-server crashing but there are issues debugging from there. We can consider merging this as a strict improvement over crashing on startup 🤣

But give me some time to try more examples and figure out the scale of changes to properly support this.

@DavidSpickett
Copy link
Collaborator

From what I've seen, this is a decent start but there are further issues to be dealt with. Details on #138717.

I have to work on some other SME changes first, so it will be a few weeks until I can do anything for this. @laverdet if you want to pursue this yourself in the meantime, feel free to do so.

In which case you will find https://lldb.llvm.org/resources/debugging.html# useful, and you can try setting up the Foundation Model to test SVE+SME if you want, but since I'll want to test the changes myself anyway, easier to leave that to me.

@DavidSpickett
Copy link
Collaborator

I think there are kernel issues that need to be fixed before all the LLDB features can work. So don't waste your own time on this right now, I will coordinate with Arm's kernel team to get this working.

@mstorsjo
Copy link
Member

Therefore, this situation should only be an issue with older kernels, so perhaps it not something that regular user mode applications should need to worry about (unless specifically wanting to run with older kernels).

I will test this PR with an older kernel as well. Would be nice if it works there too.

Did you manage to test things with an older kernel, at least on the level of what hwcaps are presented - to confirm you'd get the inconsistent hwcaps in that case (sve2 enabled, sve1 disabled)?

@DavidSpickett
Copy link
Collaborator

I checked cpuinfo and hwcaps for 6.5 (doesn't have the fix) and 6.15 (does, and it was what I was using anyway). Same machine configuration, SME only.

$ uname -a
Linux e125016 6.5.0 #6 SMP PREEMPT Tue May 20 09:43:29 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Features	: fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop asimddp asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti ecv afp sme smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 wfxt ebf16 sveebf16 cssc sme2 smei16i32 smebi32i32 mops

$ LD_SHOW_AUXV=1 sleep 1
AT_SYSINFO_EHDR:      0xffff87d8b000
AT_MINSIGSTKSZ:       4720
AT_HWCAP:             ef91ff87
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0xaaaabc250040
AT_PHENT:             56
AT_PHNUM:             9
AT_BASE:              0xffff87d52000
AT_FLAGS:             0x0
AT_ENTRY:             0xaaaabc251c80
AT_UID:               1000
AT_EUID:              1000
AT_GID:               1000
AT_EGID:              1000
AT_SECURE:            0
AT_RANDOM:            0xffffc107e7f8
AT_HWCAP2:            0x9a7bf9bf383
AT_EXECFN:            /usr/bin/sleep
AT_PLATFORM:          aarch64
AT_??? (0x1b): 0x1c
AT_??? (0x1c): 0x20

The cpuinfo reports SME and SVE2, no SVE but some SVE sub features like svebf16 are there (though they might be part of SVE2).

$ uname -a
Linux e125016 6.15.0-rc1-00035-g33c4618d0ac0 #7 SMP PREEMPT Tue May 20 10:16:56 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Features	: fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop asimddp asimdfhm dit uscat ilrcpc flagm sb paca pacg gcs dcpodp flagm2 frint i8mm bf16 dgh rng bti ecv afp sme smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 wfxt ebf16 cssc sme2 smei16i32 smebi32i32 mops hbc poe

$ LD_SHOW_AUXV=1 sleep 1
AT_SYSINFO_EHDR:      0xffff8a0e1000
AT_MINSIGSTKSZ:       4720
AT_HWCAP:             1ef91ff87
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0xaaaac5360040
AT_PHENT:             56
AT_PHNUM:             9
AT_BASE:              0xffff8a0a9000
AT_FLAGS:             0x0
AT_ENTRY:             0xaaaac5361c80
AT_UID:               1000
AT_EUID:              1000
AT_GID:               1000
AT_EGID:              1000
AT_SECURE:            0
AT_RANDOM:            0xffffccef9b58
AT_HWCAP2:            0x800019a5bf9be181
AT_??? (0x1d): 0x0
AT_EXECFN:            /usr/bin/sleep
AT_PLATFORM:          aarch64
AT_??? (0x1b): 0x1c
AT_??? (0x1c): 0x20

With the fix it now reports SME and no SVE features at all.

Decoding the HWCAPS I saw this difference:

$ diff 65_features 615_features 
23a24
> HWCAP_GCS
26d26
< HWCAP2_SVE2
29,30d28
< HWCAP2_SVEI8MM
< HWCAP2_SVEBF16
47d44
< HWCAP2_SVE_EBF16
53,54c50,51
< 
< 
---
> HWCAP2_HBC
> HWCAP2_POE

Some of this is the kernel gaining new feature support. The relevant bits are that 6.15 removes HWCAP2_SVE2, HWCAP2_SVEI8MM, HWCAP2_SVEBF16 and HWCAP2_SVE_EBF16.

And side note: a kernel developer told me that you can simulate this in qemu if you tell the kernel to hide the SVE feature using some sort of boot parameter. I haven't found out what yet, but the effect is equivalent for this purpose.

@DavidSpickett
Copy link
Collaborator

So if you have code that wants to use SVE2 and may run on an SME only device with this older kernel, I think you could make it check for HWCAP_SVE and HWCAP2_SVE2. As the Architecture manual says:

FEAT_SVE2, Scalable Vector Extension version 2
<...>
If FEAT_SVE2 is implemented, then FEAT_SVE is implemented.

Meaning that from userspace:

  • HWCAP_SVE = sve only
  • HWCAP2_SVE2 + HWCAP_SVE = sve and sve2 (and if you have SME, in streaming mode too)
  • HWCAP2_SVE2 = sve only in streaming mode (because you can't have SVE2 only, unless you have SME and this buggy kernel version)

If you are making changes like this, I can double check with Arm's kernel team that your approach is what they expect software to do. I think what I've suggested would work, but not sure that the kernel authors want us doing it that way.

@mstorsjo
Copy link
Member

So if you have code that wants to use SVE2 and may run on an SME only device with this older kernel, I think you could make it check for HWCAP_SVE and HWCAP2_SVE2. As the Architecture manual says:

FEAT_SVE2, Scalable Vector Extension version 2
<...>
If FEAT_SVE2 is implemented, then FEAT_SVE is implemented.

Meaning that from userspace:

  • HWCAP_SVE = sve only
  • HWCAP2_SVE2 + HWCAP_SVE = sve and sve2 (and if you have SME, in streaming mode too)
  • HWCAP2_SVE2 = sve only in streaming mode (because you can't have SVE2 only, unless you have SME and this buggy kernel version)

If you are making changes like this, I can double check with Arm's kernel team that your approach is what they expect software to do. I think what I've suggested would work, but not sure that the kernel authors want us doing it that way.

Thanks!

So for me it’s mainly whether I should pursue changes like https://code.videolan.org/videolan/dav1d/-/merge_requests/1787 in dav1d, ffmpeg and x264. (Individual functions only check the sve2 flag internally, e.g. https://code.videolan.org/videolan/dav1d/-/blob/1.5.1/src/arm/mc.h?ref_type=tags#L101. Changing all such occasions to check both sve and sve2 flags internally would be brittle. Therefore I could maybe do what that merge request does, to account for this in the internal setting up of flags.)

But as this is only a historical issue with older kernels, I’m leaning towards just skipping it - at least until some user reports actually hitting it.

@DavidSpickett
Copy link
Collaborator

Sounds sensible. If docker has bundled the fixed kernel, then that's most of the M4 users covered and anyone else can update their kernel.

@DavidSpickett
Copy link
Collaborator

@laverdet I've been told we need kernel changes to handle parts of this. Those are planned, and I will work on the lldb side once they are available.

In the meantime, this patch does prevent lldb crashing but I'm not comfortable merging it when other features won't work. If we get to the next release time and we don't have a complete solution we can consider whether to commit this as a temporary work around.

If you do use lldb with this patch, I've no doubt you'll find other problems so please add them to #138717 so I know to check them with the new changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants