Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMTRIAGE-7221: Sat unable to pull image #3673

Closed
wants to merge 1 commit into from

Conversation

annapoorna-s-alt
Copy link
Contributor

@annapoorna-s-alt annapoorna-s-alt commented Sep 27, 2024

Summary and Scope

Sat unable to pull the image on Tyr after a CSM 1.6.0-alpha.55 upgrade
Solution: Add a zypper update cray-sat-podman after CSM 1.6 RPMs are uploaded to Nexus

Issues and Related PRs

Resolves CASMTRIAGE-7221.

Testing

List the environments in which these changes were tested.

Tested on:

starlord

Test description:

Test the setup-nexus.sh script of management nodes rollout stage when upgrading starlord from sm 1.5 to to 1.6

Risks and Mitigations

Low

Pull Request Checklist

  • Version number(s) incremented, if applicable
  • Copyrights updated
  • License file intact
  • Target branch correct
  • Testing is appropriate and complete, if applicable
  • HPC Product Announcement prepared, if applicable

@annapoorna-s-alt
Copy link
Contributor Author

Testing output from starlord

ncn-m001:/etc/cray/upgrade/csm/media/upgrade-products-1.6/csm-1.6.0-beta.5/lib # sat --version
Trying to pull registry.local/artifactory.algol60.net/csm-docker/stable/cray-sat:3.32.6...
Getting image source signatures
Copying blob b311eebc88fa done
Copying blob 35c23073c252 skipped: already exists
Copying blob 4042a5b85226 done
Copying blob 4f2f07f72b2e done
Copying blob 41e6d140149f done
Copying blob d078792c4f91 skipped: already exists
Copying blob f88108bc5538 done
Copying blob 96f9be8fa567 done
Copying blob 6c7e26c5efce done
Copying blob 68ddf38ba4e1 done
Copying blob a753459f4feb done
Copying config c02c7c970e done
Writing manifest to image destination
Storing signatures
sat 3.32.6

Copy link
Contributor

@haasken-hpe haasken-hpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the zypper update needs to happen later in this script.

Can you please clarify how this was tested? Was the setup-nexus.sh script just re-run after the problem was observed?

If that's the case, then I assume the lib/setup-nexus.sh script had already been run (by prerequisites.sh). So the updated cray-sat-podman RPM would have already been uploaded to Nexus by the previous invocation of this script. That would explain why the added zypper update command updated the cray-sat-podman RPM to the latest version despite being before the nexus-upload commands that upload RPMs to the package repos in Nexus.

@@ -31,6 +31,9 @@ skopeo-copy "${sat_image}:${sat_version}" "${sat_image}:csm-latest"
mkdir -p /opt/cray/etc/sat
echo "${sat_version}" > /opt/cray/etc/sat/version

# Update the cray-sat-podman package to ensureversion consistency
zypper update cray-sat-podman
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer review, I think this is the wrong location for this command. It looks like lines 40-46 below are the lines that actually upload the new RPMs from this CSM release into Nexus. We want to run the zypper update after that happens. I'm not exactly sure which repo has the newer cray-sat-podman RPM in it, so we could just run it after all the nexus-upload commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at Nexus on starlord, and it looks like it's included in the csm-${RELEASE_VERSION}-embedded repository. I think that's where packages which are also already a part of the new CSM management node images are located, if I recall correctly.

So the zypper update would need to be done after line 46.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify how this was tested? Was the setup-nexus.sh script just re-run after the problem was observed?

yes I observed the issue on starlord after reporting from QE while upgrading the system from 1.5 to 1.6. With the changes I reran the setup-nexus.sh script. Thats the output I added. As per your comment I addressed to have the changes after uploading the new RPMs from CSM release into Nexus.

Sat unable to pll the image on Tyr after a CSM 1.6.0-alpha.55 upgrade
Solution: Add a zypper update cray-sat-podman after CSM 1.6 RPMs are uploaded to Nexus
@haasken-hpe
Copy link
Contributor

@rustydb, @mitcharf, @mtupitsyn, does it seem reasonable to you to add a zypper update at this point in lib/setup-nexus.sh? Annapoorna has been able to test this after the issue was reported, but has not been able to test the modified version of the script at exactly the time it is first run by prerequisites.sh in an upgrade scenario. Is there any reason a zypper update would not work as expected at that point?

@haasken-hpe
Copy link
Contributor

This is obsoleted by https://github.com/Cray-HPE/docs-csm/pull/5486/files

That PR adds a call to zypper update immediately after running lib/setup-nexus.sh, which is basically equivalent to this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants