Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: z_test_1cpu_start() makes only CPU0 active #70579

Merged

Conversation

peter-mitsis
Copy link
Collaborator

When z_test_1cpu_start() is called to ensure that only a single CPU on an SMP system is available for use in a test, this commit will ensure that that CPU is the primary CPU--CPU0. This is done because some timer drivers only have the timer interrupt processed by one CPU.

A bit of a song and dance is performed to achieve this without enabling the CPU mask/affinity pinning API. If the cpuhold thread is found to be executing on CPU0, then a new copy of cpuhold thread is created. Once the new copy is executing (incidentally guaranteed to be on another CPU) then it informs the original copy and busy waits until it the original copy is switched out of CPU0. At this point, we can create the next cpuhold thread to occupy another CPU if needed.

During this song and dance, it is critical that the 'copy' not pend. If it pends, we can not guarantee which CPU it will execute on when it unpends. As the cpuhold threads have the highest priority, nothing is going to cause them to execute on another CPU for as long as they do not pend.

Fixes #70494

@nashif
Copy link
Member

nashif commented Mar 22, 2024

@peter-mitsis does not seem ot solve the issue. Running on tests/kernel/workq/work/kernel.workqueue.api, get stuck at DEBUG - DEVICE: START - test_1cpu_basic_reschedule as without this patch

@peter-mitsis
Copy link
Collaborator Author

That is unfortunate. Digging a little more into this one.

When z_test_1cpu_start() is called to ensure that only a single CPU
on an SMP system is available for use in a test, this commit will
ensure that that CPU is the primary CPU--CPU0. This is done because
some timer drivers only have the timer interrupt processed by one CPU.

A bit of a song and dance is performed to achieve this without enabling
the CPU mask/affinity pinning API. If the cpuhold thread is found to
be executing on CPU0, then a new copy of cpuhold thread is created. Once
the new copy is executing (incidentally guaranteed to be on another CPU)
then it informs the original copy and busy waits until it the original
copy is switched out of CPU0. At this point, we can create the next
cpuhold thread to occupy another CPU if needed.

During this song and dance, it is critical that the 'copy' not pend. If
it pends, we can not guarantee which CPU it will execute on when it
unpends. As the cpuhold threads have the highest priority, nothing is
going to cause them to execute on another CPU for as long as they do
not pend.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
@peter-mitsis
Copy link
Collaborator Author

Found the issue with the previous version. I was tagging the wrong thread resource as used so the 2nd time it found that the cpu_hold thread was on CPU0, it reused an existing k_thread--hijinks ensued.

This one should work.

@dleach02 dleach02 merged commit 87ca079 into zephyrproject-rtos:main Mar 26, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

intel_adsp: many tests fail due to change to timer interrupts on secondary cores not being enabled
5 participants