Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure unhealthy Windows instances get marked correctly #614

Merged
merged 3 commits into from
Aug 7, 2019

Conversation

jeremiahsnapp
Copy link
Contributor

Sometimes Windows instances fail to come up completely but are not marked unhealthy. It turns out sometimes the generated randomized password for the buildkite-user doesn't meet the password policy requirements and causes a failure of the bk-install-elastic-stack.ps1 script. Normally that would trigger the on_error trap which would mark the instance unhealthy and it would be replaced but unfortunately there is a bug in the on_error trap that was preventing it from getting the instance ID. This leaves the instance running indefinitely but without a running buildkite-agent service which means jobs can wait indefinitely for an instance because the scaler lambda believes a healthy instance already exists.

This PR fixes the bug in the on_error trap so a failure of the bk-install-elastic-stack.ps1 script can mark the instance unhealthy. It also adds a retry loop for randomizing the buildkite-agent user's password giving it a few extra chances to get a good password before failing.

The PR also fixes the cloudformation by putting quotes around %v for a windows environment variable.

Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
Sometimes the random password generated doesn't match the
password policy on Windows. This Try/Catch loop gives it
a few extra chances to get a good password before failing.

Signed-off-by: Jeremiah Snapp <jeremiah@chef.io>
@lox lox merged commit eb6c493 into buildkite:master Aug 7, 2019
@jeremiahsnapp jeremiahsnapp deleted the windows-fixes branch August 8, 2019 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants