Skip to content

UpdateIAmAlive retries indefinitely even though it won't be able to succeed #7784

Open
@oleggolovkov

Description

@oleggolovkov

I had this issue with Redis clustering but I believe it may be more generic

Steps to reproduce:

  1. Spin up a typical cluster with non-localhost clustering and let it operate for some time
  2. Make an entry for one of the silos disappear from clustering storage without touching the cluster itself. One possibility to do so is to DELETE a row from the table if ADO.NET clustering is used. The other possibility (which happens in my case) is to have Redis clustering with no 'Redis persistence' and simply reboot this Redis instance
  3. Observe exceptions like this happening every few seconds till the end of times
    Failed to update table entry for this silo, will retry shortly: "Orleans.Clustering.Redis.RedisClusteringException: Could not find a value for the key S10.0.1.101:8952:392386255 at Orleans.Clustering.Redis.RedisMembershipTable.UpdateIAmAlive(MembershipEntry entry) at Orleans.Runtime.MembershipService.MembershipTableManager.UpdateIAmAlive() in /_/src/Orleans.Runtime/MembershipService/MembershipTableManager.cs:line 201 at Orleans.Runtime.MembershipService.MembershipAgent.UpdateIAmAlive() in /_/src/Orleans.Runtime/MembershipService/MembershipAgent.cs:line 72"

Actual result:
Orleans.Runtime.MembershipService.MembershipAgent enters a loop which it can't resolve itself

Expected result:
Orleans.Runtime.MembershipService.MembershipAgent is able to understand permanent failures (in my example the error message contains Could not find a value for the key) and does not retry those (in my example it explicitly states that it will: will retry shortly) but instead does something else like trying to reconfigure the whole cluster or consider itself a newly joining silo

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions