Ignore disabled hyperthreads #291

Flamefire · 2025-05-06T11:59:07Z

Only consider processors with an APIC_ID as valid.

For processors with Hyperthreads but disabled SMT the APIC_ID will never be set by CPUInfo and stays zero for the "disabled" threads. Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share APIC_ID=0

It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.

So this patch adds a check for CPUINFO_LINUX_FLAG_APIC_ID.
The flag CPUINFO_LINUX_FLAG_APIC_ID will be set if and only if the APIC_ID has been set. If it has not then CPUInfo doesn't has any information about the APIC_ID, hence the default value of zero. See

cpuinfo/src/x86/linux/cpuinfo.c

Lines 68 to 69 in 39ea79a

    
           processor->apic_id = apic_id; 
        
           processor->flags |= CPUINFO_LINUX_FLAG_APIC_ID;

On a dual socket AMD EPYC 9334 system (2x32 cores, 128 threads total (2/core)) the current output of cache-info is

Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 65 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 72 processors

Note the "65 processors" being one-off.

With this patch:

Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 8 processor

This matches the Zen4 architecture having 32KB Data and 32KB instruction L1 cache per core, 1MB L2 cache per core and 256MB L3 cache total.

Fixes #238

malfet · 2025-05-12T17:08:37Z

@Flamefire can you please fix lint? (namely run clang-format on your PR)
Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change

Flamefire · 2025-05-13T07:56:33Z

@Flamefire can you please fix lint? (namely run clang-format on your PR) Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change

Fixed the clang format issue and squashed it into the initial commit

I'm not sure what to put into the description in addition to what I have. I added a paragraph linking to the logic of CPUInfo itself. I.e. the issue is that while CPUInfo "knows" that it doesn't has information on the APIC ID it still uses it as-if it was valid.
As for WHY the information is not available I wasn't able to find any reliable information. But the change at least makes sense from the perspective of the existing logic and observations.

malfet · 2025-05-22T23:46:05Z

@pytorchbot rebase

Only consider processors with an APIC_ID as valid. For processors with Hyperthreads but disabled SMT the APIC_ID will never be set and stays zero for the "disabled" threads. Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share `APIC_ID=0` It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.

Flamefire · 2025-05-23T07:33:07Z

Rebased manually. Not sure why the bot didn't do it.

fbarchard · 2025-06-06T22:10:32Z

#238 is for nnpack? Did you mean xnnpack?
The change looks okay.

I've seen similar issues on arm where cores are unavailable, but for different reasons
on low end devices such as nexus 5, overheating would turn off cpus, starting with the big cores.
on high end devices, but running on 32 bit, only medium cores are available.. the little and big/prime cores are disabled.
the cores are there, but not available, so we would prefer count the number of cores actually available, when deciding how many threads to use, and make achitectual choices based on the best available core

Flamefire · 2025-06-08T08:34:58Z

#238 is for nnpack? Did you mean xnnpack? The change looks okay.

NNPACK is correct, that's where I ran into a crash due to this. See the issue there: Maratyszcza/NNPACK#218

fbarchard · 2025-11-12T03:15:25Z

This PR caused crashes on Android and Linux platforms
#339

So I've prepared a PR to rollback to the previous version
#340

Flamefire · 2025-11-12T07:50:56Z

Are you sure this is caused by this PR? I can't see how.

Stacktrace:

#00 pc 0910812b cpuinfo_compute_max_cache_size+37
#01 pc 091069a7 cpuinfo_x86_linux_init+4504

This leads to this call: https://github.com/Flamefire/cpuinfo/blob/b9558e923a0f915ddc2357116abc0d6aaaaea4e8/src/x86/linux/init.c#L638 and then this function:

cpuinfo/src/cache.c

Line 6 in 735b2ac

    
           uint32_t cpuinfo_compute_max_cache_size(const struct cpuinfo_processor* processor) {

There can only be a segfault if processor is invalid, which it is clearly not in this case. Or if one of the cache pointers is pointing to a wrong address.

Those pointers are set to members of an array whose sizes are calculated by cpuinfo_x86_count_objects using the same logic as that code that sets the pointers.
Additionally those pointers are assigned to in cpuinfo_x86_linux_init already so if writing to those doesn't crash, how could reading them?

Any idea what actually causes the crash on Android?

Because this code works for me on multiple (non-Android) systems.

facebook-github-bot added the cla signed label May 6, 2025

This was referenced May 6, 2025

Incorrect cache-info output with nosmt linux kernel command line #238

Open

Floating Point exception in Convolution with disabled SMT pytorch/pytorch#153139

Open

Flamefire force-pushed the patch-1 branch from 0cdd479 to f35c09e Compare May 13, 2025 07:41

Flamefire force-pushed the patch-1 branch from f35c09e to b9558e9 Compare May 23, 2025 06:44

fbarchard approved these changes Jun 6, 2025

View reviewed changes

GregoryComer approved these changes Nov 5, 2025

View reviewed changes

GregoryComer merged commit 0d5985d into pytorch:main Nov 5, 2025
21 of 22 checks passed

Flamefire deleted the patch-1 branch November 5, 2025 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ignore disabled hyperthreads #291

Ignore disabled hyperthreads #291

Uh oh!

Flamefire commented May 6, 2025 •

edited

Loading

Uh oh!

malfet commented May 12, 2025

Uh oh!

Flamefire commented May 13, 2025

Uh oh!

malfet commented May 22, 2025

Uh oh!

Flamefire commented May 23, 2025

Uh oh!

fbarchard commented Jun 6, 2025

Uh oh!

Flamefire commented Jun 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

fbarchard commented Nov 12, 2025

Uh oh!

Flamefire commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	processor->apic_id = apic_id;
	processor->flags \|= CPUINFO_LINUX_FLAG_APIC_ID;

Ignore disabled hyperthreads #291

Ignore disabled hyperthreads #291

Uh oh!

Conversation

Flamefire commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet commented May 12, 2025

Uh oh!

Flamefire commented May 13, 2025

Uh oh!

malfet commented May 22, 2025

Uh oh!

Flamefire commented May 23, 2025

Uh oh!

fbarchard commented Jun 6, 2025

Uh oh!

Flamefire commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fbarchard commented Nov 12, 2025

Uh oh!

Flamefire commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Flamefire commented May 6, 2025 •

edited

Loading

Flamefire commented Jun 8, 2025 •

edited

Loading

Flamefire commented Nov 12, 2025 •

edited

Loading