Open
Description
I had this issue with Redis clustering but I believe it may be more generic
Steps to reproduce:
- Spin up a typical cluster with non-localhost clustering and let it operate for some time
- Make an entry for one of the silos disappear from clustering storage without touching the cluster itself. One possibility to do so is to DELETE a row from the table if ADO.NET clustering is used. The other possibility (which happens in my case) is to have Redis clustering with no 'Redis persistence' and simply reboot this Redis instance
- Observe exceptions like this happening every few seconds till the end of times
Failed to update table entry for this silo, will retry shortly: "Orleans.Clustering.Redis.RedisClusteringException: Could not find a value for the key S10.0.1.101:8952:392386255 at Orleans.Clustering.Redis.RedisMembershipTable.UpdateIAmAlive(MembershipEntry entry) at Orleans.Runtime.MembershipService.MembershipTableManager.UpdateIAmAlive() in /_/src/Orleans.Runtime/MembershipService/MembershipTableManager.cs:line 201 at Orleans.Runtime.MembershipService.MembershipAgent.UpdateIAmAlive() in /_/src/Orleans.Runtime/MembershipService/MembershipAgent.cs:line 72"
Actual result:
Orleans.Runtime.MembershipService.MembershipAgent
enters a loop which it can't resolve itself
Expected result:
Orleans.Runtime.MembershipService.MembershipAgent
is able to understand permanent failures (in my example the error message contains Could not find a value for the key
) and does not retry those (in my example it explicitly states that it will: will retry shortly
) but instead does something else like trying to reconfigure the whole cluster or consider itself a newly joining silo