Skip to content

RBMC: Check again for dead sibling service #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 1120
Choose a base branch
from

Conversation

spinler
Copy link
Contributor

@spinler spinler commented Mar 13, 2025

During some bad path testing the sibling daemon on each BMC would make it past the existing check done to make sure it was running and then die. This would cause the wait for the sibling interface to be on D-Bus to time out. At that point each BMC became active since it thought the sibling daemon was fine and just the sibling BMC had the problem.

Fix this by checking again if the sibling daemon is running when the sibling interface still isn't on D-Bus after waiting for it. If it isn't, become passive.

Tested:

This is seen on each BMC:

Waiting for sibling interface and/or heartbeat: Present = False, Heartbeat = False
Done waiting for sibling. Interface present = False, heartbeat = False
Sibling service state is failed
Role = xyz.openbmc_project.State.BMC.Redundancy.Role.Passive due to: Sibling BMC service is not running

During some bad path testing the sibling daemon on each BMC would make
it past the existing check done to make sure it was running and then
die. This would cause the wait for the sibling interface to be on D-Bus
to time out.  At that point each BMC became active since it thought the
sibling daemon was fine and just the sibling BMC had the problem.

Fix this by checking again if the sibling daemon is running when the
sibling interface still isn't on D-Bus after waiting for it.  If it
isn't, become passive.

Tested:

This is seen on each BMC:

```
Waiting for sibling interface and/or heartbeat: Present = False, Heartbeat = False
Done waiting for sibling. Interface present = False, heartbeat = False
Sibling service state is failed
Role = xyz.openbmc_project.State.BMC.Redundancy.Role.Passive due to: Sibling BMC service is not running
```

Signed-off-by: Matt Spinler <spinler@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant