Skip to content

mem-ruby: Prevent LL/SC livelock in MESI protocols (#1384) #1399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

hexengraf
Copy link
Contributor

Fix #1384.

MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC livelocks when simulating boards with high core count.

This fix is based on MOESI_CMP_directory's implementation of locked states, but tailors the solution to only apply it when a Load-Linked is initiated.

There are two new states to act as locked states and stall any messages leading to eviction:

  • LLSC_E: equivalent to E state, go to E after timeout.
  • LLSC_M: equivalent to M state, go to M after timeout.

The main new event is Load_Linked, which is very similar (in behavior) to a Store, reusing several transient states. When a controller receives the exclusive data, it differentiates a Load_Linked from a Store by checking a new field added to the TBE: 'isLoadLinked'. It triggers a different event when it is a Load_Linked, which in turn causes the transition to one of the locked states.

The entire mechanism can be turned off by setting 'use_llsc_lock' to false, and the amount of time to keep locked is defined by 'llsc_lock_timeout_latency'.

Change-Id: I13f415b6b7890d51d01f23001047d2363467a814

@BobbyRBruce BobbyRBruce requested a review from powerjg July 31, 2024 11:18
Copy link
Member

@BobbyRBruce BobbyRBruce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm giving this a cautious approval. I would like someone with experience with MESI to also approve before merging.

MESI_Two_Level and MESI_Three_Level protocols are susceptible to LL/SC
livelocks when simulating boards with high core count.

This fix is based on MOESI_CMP_directory's implementation of locked
states, but tailors the solution to only apply it when a Load-Linked is
initiated.

There are two new states to act as locked states and stall any messages
leading to eviction:
* LLSC_E: equivalent to E state, go to E after timeout.
* LLSC_M: equivalent to M state, go to M after timeout.

The main new event is Load_Linked, which is very similar (in behavior)
to a Store, reusing several transient states. When a controller
receives the exclusive data, it differentiates a Load_Linked from a
Store by checking a new field added to the TBE: 'isLoadLinked'. It
triggers a different event when it is a Load_Linked, which in turn
causes the transition to one of the locked states.

The entire mechanism can be turned off by setting 'use_llsc_lock' to
false, and the amount of time to keep locked is defined by
'llsc_lock_timeout_latency'.

Change-Id: I13f415b6b7890d51d01f23001047d2363467a814
@hexengraf hexengraf force-pushed the mem-ruby-mesi-llsc-fix branch from 79288ba to 49a7b7a Compare July 31, 2024 15:45
@ivanaamit ivanaamit added the mem-ruby Ruby caches, structures, and protocols label Aug 1, 2024
@powerjg
Copy link
Contributor

powerjg commented Aug 9, 2024

Looks OK to me. Since it was an Arm bug, @giactra may want to take a look.

@ivanaamit ivanaamit added arch-arm The ARM ISA cpu-o3 gem5's Out-Of-Order CPU labels Aug 21, 2024
@ivanaamit
Copy link
Contributor

ivanaamit commented Aug 21, 2024

Hi @giactra, since this change is related to ARM, you might want to review it.

@giactra
Copy link
Contributor

giactra commented Aug 22, 2024

Hi @giacomo, since this change is related to ARM, you might want to review it.

Thanks @ivanaamit ; you might have tagged the wrong Giacomo though ;)

@ivanaamit
Copy link
Contributor

ivanaamit commented Sep 12, 2024

Hi @giactra, if you have time, we’d appreciate it if you could review this PR. Even a quick review would be very helpful. Thank you!

@BobbyRBruce
Copy link
Member

@giactra : This affects ARM. Could you review this? We're ready to merge but would appreciate ARM approval.

@ivanaamit ivanaamit merged commit 7bddc76 into gem5:develop Oct 28, 2024
36 checks passed
@hexengraf hexengraf deleted the mem-ruby-mesi-llsc-fix branch January 23, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm The ARM ISA cpu-o3 gem5's Out-Of-Order CPU mem-ruby Ruby caches, structures, and protocols
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simulation stuck in LL/SC livelock when using ARM+O3 CPUs with MESI_Three_Level and MESI_Two_Level protocols
5 participants