Skip to content

Conversation

@bparees
Copy link
Contributor

@bparees bparees commented Mar 2, 2015

For now this will retry retryable failures forever on a tight loop. I'm still contemplating switching it to something like 100. The problem is that without backoff (which the current retrycontroller logic does not offer for good reason), if there's a temporary API outage we'll burn through any reasonable number of retries rapidly.

@bparees
Copy link
Contributor Author

bparees commented Mar 2, 2015

[test]

@bparees bparees changed the title retry build errors [WIP] retry build errors Mar 2, 2015
@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_openshift3/1238/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this message belongs here - retry decisions belong in a higher layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the controller is the bit that has the logic to know whether a particular event can/should be retried or not (based on what state has been modified or would be duplicated if the event is replayed) so i don't see how you can(well, should) defer the decision to retry out of the controller. The fact is the controller is making the decision to retry or not based on the type of error it returns and knowing the logic of the retry controller anyway. There's really no avoiding the fact that the controller needs to have some knowledge of the types of errors the retry handler is expecting and how it will handle them. the less of that there is, the better, imho. (which is why I preferred the approach that just returned RetryableError vs FatalError wrapping the real error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher layer = your retry handler. That's where you make a decision about retry. Adding it here too confuses the issue

@bparees bparees force-pushed the build_errors branch 4 times, most recently from 657c533 to 482ca27 Compare March 4, 2015 19:21
@bparees bparees changed the title [WIP] retry build errors retry build errors Mar 4, 2015
@bparees
Copy link
Contributor Author

bparees commented Mar 4, 2015

@smarterclayton updated per your comments. thoughts? notable and debatable design points:

  1. retryable errors are retried forever (max attempts == -1) in a tight loop.
  • i'm not averse to reducing this to 10 or 100, but since it's a tight loop my guess is anything that fails once will fail N times because the error condition won't be corrected fast enough for it to have a chance of succeeding later (eg the api endpoint becomes available again)
  1. all errors (retried or not) go through kutil.HandleError
  • if something is spinning in a retry loop it seems important to make that apparent in the logs, particularly since we're not going to eventually perm-fail it (see Add link to GOPATH doc and fixup typo #1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc

@smarterclayton
Copy link
Contributor

Minor comment, otherwise this is fine.

@bparees
Copy link
Contributor Author

bparees commented Mar 4, 2015

godoc'd. [merge]

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/1115/) (Image: devenv-fedora_973)

@openshift-bot
Copy link
Contributor

Evaluated for origin up to 2841124

@smarterclayton
Copy link
Contributor

post-hoc LGTM

----- Original Message -----

godoc'd. [merge]


Reply to this email directly or view it on GitHub:
#1194 (comment)

openshift-bot pushed a commit that referenced this pull request Mar 4, 2015
@openshift-bot openshift-bot merged commit 4cc3962 into openshift:master Mar 4, 2015
@bparees bparees deleted the build_errors branch March 5, 2015 15:28
Miciah pushed a commit to Miciah/origin that referenced this pull request Jun 27, 2018
UPSTREAM: 59931: do not delete node in openstack, if those still exist in cloudprovider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants