-
Notifications
You must be signed in to change notification settings - Fork 364
Description
Issue
When a user calls PATCH v3/apps/:guid
to rename an app while during that time if the bbs instance is unavailable (BBS failover) the request will fail with 503 - Runner is unavailable
, though the app is renamed.
Context
When a user calls PATCH v3/apps/:guid
to rename an app while during that time if the bbs instance is unavailable (BBS failover) the request will fail with 503 - Runner is unavailable
. Though the app is renamed. Possibly introduced by #3107
Steps to Reproduce
- A sample test app is already running
- After confirming that main bbs process is stopped or it dies (meanwhile another instance should take over the controlplane role within 15 seconds) and an application is running, during time sent a request to change the name of the app:
cf curl "/v3/apps/3b9f7a3f-d929-4053-8122-8215733cc81b" -X PATCH -d '{"name": "nodejs-app-renamed"}'
- Try to query the app using the guid and indeed it is changed but user is entertained with
503 Runner is unavailable
error.
Expected result
Should return 200 instead of 503 error.
Current result
User is entertained with the following error
{"errors":[{"detail":"Runner is unavailable: Process Guid: 3b9f7a3f-d929-4053-8122-8215733cc81b-6ef4b4f1-8cea-48c3-8888-888bb4723380: Connection refused - connect(2) for \"bbs.service.cf.internal\" port 8889 (bbs.service.cf.internal:8889)","title":"CF-RunnerUnavailable","code":170015}]}
Possible Fix
- Prevent 503 errors during PATCH v3/apps requests, a short exponential backoff with reasonable timeout. A simple retries were added using this PR Use Net::HTTP::Persistent in Diego client #3170 to mitigates the risk of Runner is unavailable errors during a cf push when the client is unable to reach a bbs instance.
- If there is a sync between the CC and diego, should return 200 instead of 503, improve error logging.
- Or this is expected to behave like this.