Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodeadm: retry IMDS 404 errors #1970

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ndbaker1
Copy link
Member

@ndbaker1 ndbaker1 commented Sep 17, 2024

Issue #, if available:

Description of changes:

nodeadm fails to make IMDS calls when the instance's credentials propagate slower, causing IMDS to return an error indicating no credentials were passed and a 404 is returned.

It now checks for this error message and counts it as retryable

Aug 28 19:12:30 localhost nodeadm[1491]: {"level":"info","ts":1724872350.0166261,"caller":"init/init.go:148","msg":"Fetching instance details.."}
Aug 28 19:12:30 localhost nodeadm[1491]: SDK 2024/08/28 19:12:30 DEBUG attempting waiter request, attempt count: 1
Aug 28 19:12:30 localhost nodeadm[1491]: SDK 2024/08/28 19:12:30 DEBUG request failed with unretryable error http response error StatusCode: 404, request to EC2 IMDS failed

cloud-init showing IMDS resolving ~2 second later

2024-08-28 19:12:32,448 - url_helper.py[DEBUG]: Read from http://169.254.169.254:80/2021-03-23/meta-data/instance-id (200, 19b) after 1 attempts

other notable changes:

  • removed all direct uses of the aws imds client besides in the internal helper implementation
  • fixed gosec complaints

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@ndbaker1
Copy link
Member Author

/ci

Copy link
Contributor

@ndbaker1 roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@ndbaker1 the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2023success ✅success ✅
1.30 / al2023success ✅success ✅

@mattcjo
Copy link

mattcjo commented Sep 20, 2024

@nbaker1 Quick update on root cause. The node successfully authenticated with IMDS, but the resource it was trying to access was missing. Retry logic is still an appropriate solution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants