Increase timeout to 80s. Backoff poll frequency.#902
Conversation
|
This is a minor improvement only. |
|
To be frank, I don't really like that we use polling here. But if it must be, maybe we can decide in advance what a reasonable amount of time for this operation would be like. And then we might use an approach such as this (taken from an earlier version of the entropy test): def retry(func, exc_type, timeouts=(8, 7, 15, 10, 20, 30, 60)):
if isinstance(exc_type, str):
exc_type = exc_type.split(',')
timeout_iter = iter(timeouts)
# do an initial sleep because func is known fail at first anyway
time.sleep(next(timeout_iter))
retries = 0
while True:
try:
func()
except Exception as e:
retries += 1
timeout = next(timeout_iter, None)
if timeout is None or e.__class__.__name__ not in exc_type:
raise
logger.debug(f"Initiating retry in {timeout} s due to {e!r} during {func!r}")
time.sleep(timeout)
else:
break
if retries:
logger.debug(f"Operation {func!r} successful after {retries} retries")It uses a sequence of times instead of a fixed function. Depending on what time we expect the backup to take, we could start with a high initial number, such as 30, and then continue with 15, 10, and then increase again, maybe 15, 20. In sum, this would be 90 seconds? |
|
Well, it's hard to predict the time it will take. |
Timeout was 60s, increase to 80s. This may help on heavily loaded storage environments. We see ~60 debug messages for each timed out wait. This is more than needed and puts a bit more load on the control plane than needed. So change the approach: Start with 0.5s polling freq but increase wait time by 0.1s each time, slowly backing off. So we produce only 35 debug lines and API calls before running into the timeout. Sidenote: Exponential backoff is a well-known approach to deal with congestion. We kept it simple (linear back-off), as our timeout is not huge. Signed-off-by: Kurt Garloff <kurt@garloff.de>
d5da9ce to
9d721f2
Compare
make this consistent with the other tests; prevent confident info from being disclosed Signed-off-by: Matthias Büchse <matthias.buechse@alasca.cloud>
Signed-off-by: Matthias Büchse <matthias.buechse@alasca.cloud>
Signed-off-by: Matthias Büchse <matthias.buechse@alasca.cloud>
Timeout was 60s, increase to 80s. This may help on heavily loaded storage environments.
We see ~60 debug messages for each timed out wait. This is more than needed and puts a bit more load on the control plane than needed. So change the approach: Start with 0.5s polling freq but increase wait time by 0.1s each time, slowly backing off. So we produce only 35 debug lines and API calls before running into the timeout.
Sidenote: Exponential backoff is a well-known approach to deal with congestion. We kept it simple (linear back-off), as our timeout is not huge.