Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Base Image to Amazon Linux 2023 #1122

Merged
merged 39 commits into from
Jun 9, 2023

Conversation

triarius
Copy link
Contributor

@triarius triarius commented Jun 6, 2023

Thanks for bearing with us everyone, we have finally gotten around to baking Amazon Linux 2023 base images and getting them to run builds!

Although @toothbrush made a valiant effort in #1103, their, and our previous attempts were DoA as the user data startup scripts were failing for various reasons. Thus, the instance would be marked as unhealthy soon after they booted and the ASG would continuously be booting new instances to replace them.

While fixing this, I took the opportunity to upgrade a few packages, and I've preferred to install these through the AL2023 repos as much as possible. Some older packages have been replaced where their functionality was provided by other packages that are available by default.

Consider adding these to the changelog at the time of release.

Added

  • Support for building multiarch docker images out of the box

Changed

Upgraded

  • aws-cli v2 (follows version in repo)
  • openssl 3 (follows version in repo)
  • qemu-user-static to v7.0.0
  • amazon-ssm-agent (follows version in repo)
  • python3 to 3.9 (follows version in repo)
  • docker compose v2 to v2.18.3
  • docker cli to 24.0.2 (follows version in repo)
    • Note: docker daemon remains at 20.10.23 (follows version in repo)
    • Note: the docker group has gid 993 now

Removed

  • python2
  • cronie (replaced with systemd timers)
  • docker-compose v1
  • aws session-manager plugin

CI

  • packer installation scripts have been consolidated to be fewer
  • goss upgraded to 0.3.23
  • many more goss assertions, including the ability to run arm64 and amd64 docker images (through qemu-user-static or otherwise)

@triarius triarius force-pushed the pdp-695-amazon-linux-2023-for-elastic-ci-stack branch from 1fc70e1 to 1e431db Compare June 6, 2023 11:46
@triarius triarius force-pushed the pdp-695-amazon-linux-2023-for-elastic-ci-stack branch from 1e431db to 8752ac0 Compare June 6, 2023 12:45
},
{
"type": "shell",
"script": "scripts/upgrade-kernel.sh"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -1,2 +1,2 @@
buildkite-agent:1001:1
buildkite-agent:993:1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker rpm installed from the repos automatically creates a docker group with this gid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My spidey senses tingle a bit with this change.
I vaguely remember some fragile things hanging off these subgid mappings.
But, it's probably fine.
As long as we test it with a rootless container? Maybe via Buildkite's docker plugin or something?

@@ -24,4 +24,3 @@ KillMode=process

[Install]
WantedBy=multi-user.target
DefaultInstance=1
Copy link
Contributor Author

@triarius triarius Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed as this unit is not used as a template. Leaving it in spits out a warning in the journal

cat <<< "$(jq '."data-root"="/mnt/ephemeral/docker"' /etc/docker/daemon.json)" > /etc/docker/daemon.json
fi

# Customise address pools
cat <<<"$(jq '."default-address-pools"=[{"base":"172.17.0.0/12","size":20},{"base":"192.168.0.0/16","size":24}]' /etc/docker/daemon.json)" >/etc/docker/daemon.json

systemctl restart docker
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to restart docker to get the new settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now systemd timers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to

GIT_LFS_VERSION=3.3.0
echo "Installing git lfs ${GIT_LFS_VERSION}..."
pushd "$(mktemp -d)"
curl -sSL https://github.com/git-lfs/git-lfs/releases/download/v${GIT_LFS_VERSION}/git-lfs-linux-${ARCH}-v${GIT_LFS_VERSION}.tar.gz | tar xz
sudo git-lfs-${GIT_LFS_VERSION}/install.sh
popd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to

S3_SECRETS_HELPER_VERSION=2.1.6
echo "Downloading s3-secrets-helper ${S3_SECRETS_HELPER_VERSION}..."
sudo curl -Lsf -o /usr/local/bin/s3secrets-helper \
"https://github.com/buildkite/elastic-ci-stack-s3-secrets-hooks/releases/download/v${S3_SECRETS_HELPER_VERSION}/s3secrets-helper-linux-${ARCH}"
sudo chmod +x /usr/local/bin/s3secrets-helper

Copy link
Contributor Author

@triarius triarius Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary, the session manager plugin is used to allow logging into managed nodes with the aws-cli. A buildkite-agent typically (arguably ever) does not need to do that. It DOES need to be logged into by ssm, but the ssm-agent is there for that. We install the ssm agent on

and start it here:
sudo systemctl enable --now amazon-ssm-agent

@triarius triarius marked this pull request as ready for review June 7, 2023 00:37
@triarius triarius requested a review from a team June 7, 2023 00:37
Comment on lines -36 to -60
if [ "${MACHINE}" == "x86_64" ]; then
echo "Downloading docker-compose..."
sudo curl -Lsf -o /usr/bin/docker-compose https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-Linux-x86_64
sudo chmod +x /usr/bin/docker-compose
docker-compose --version
elif [[ "${MACHINE}" == "aarch64" ]]; then
sudo yum install -y gcc-c++ libffi-devel openssl11 openssl11-devel python3-devel

# docker-compose depends on the cryptography package, v3.4 of which
# introduces a build dependency on rust; let's avoid that for now.
# https://github.com/pyca/cryptography/blob/master/CHANGELOG.rst#34---2021-02-07
# This should be unpinned ASAP; hopefully docker-compose will offer binary
# download for arm64 at some point:
# https://github.com/docker/compose/issues/7472
CONSTRAINT_FILE="/tmp/docker-compose-pip-constraint"
echo 'cryptography<3.4' >"$CONSTRAINT_FILE"
echo 'urllib3<2' >"$CONSTRAINT_FILE"
sudo pip3 install --constraint "$CONSTRAINT_FILE" "docker-compose==${DOCKER_COMPOSE_VERSION}"

docker-compose version
else
echo "No docker compose option configured for arch ${MACHINE}"
exit 1
fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chefskiss

Copy link
Member

@pda pda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read this thoroughly enough to add a green tick, but it's looking most excellent 👍🏼

@@ -1,2 +1,2 @@
buildkite-agent:1001:1
buildkite-agent:993:1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My spidey senses tingle a bit with this change.
I vaguely remember some fragile things hanging off these subgid mappings.
But, it's probably fine.
As long as we test it with a rootless container? Maybe via Buildkite's docker plugin or something?

},
{
"type": "shell",
"script": "scripts/install-nvme-cli.sh"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these have been folded into other scripts.

goss.yaml Outdated Show resolved Hide resolved
goss.yaml Outdated Show resolved Hide resolved
goss.yaml Outdated Show resolved Hide resolved
Comment on lines 94 to 98
if [[ "$(uname -m)" == "aarch64" ]]; then
AGENT_ARCH="arm64"
else
AGENT_ARCH="amd64"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere you've got a nice case statement, how about doing that consistently here?

Suggested change
if [[ "$(uname -m)" == "aarch64" ]]; then
AGENT_ARCH="arm64"
else
AGENT_ARCH="amd64"
fi
case $(uname -m) in
x86_64) ARCH=amd64;;
aarch64) ARCH=arm64;;
*) ARCH=unknown;;
esac

(although this is very nitpicky, feel free to ignore me 😅)

Copy link
Contributor Author

@triarius triarius Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many more examples of sprinkled throughout the code base. I can't accept this as is thought, the variable was $AGENT_ARCH, but the suggestion has it as $ARCH. It's probably nicer to standardise on $ARCH if it does not exist in this script, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops, i missed that $AGENT_ARCH! Glad someone is paying attention 😅

Copy link
Contributor Author

@triarius triarius Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to

LIFECYCLED_VERSION=v3.3.0
echo "Installing lifecycled ${LIFECYCLED_VERSION}..."
sudo touch /etc/lifecycled
sudo curl -Lf -o /usr/bin/lifecycled \
https://github.com/buildkite/lifecycled/releases/download/${LIFECYCLED_VERSION}/lifecycled-linux-${ARCH}
sudo chmod +x /usr/bin/lifecycled
sudo curl -Lf -o /etc/systemd/system/lifecycled.service \
https://raw.githubusercontent.com/buildkite/lifecycled/${LIFECYCLED_VERSION}/init/systemd/lifecycled.unit

…t instead of "success"

Co-authored-by: paul david <423357+toothbrush@users.noreply.github.com>
@toothbrush
Copy link
Contributor

toothbrush commented Jun 7, 2023 via email

@triarius triarius changed the base branch from master to v6 June 7, 2023 06:45
triarius added 2 commits June 9, 2023 11:59
They need to have some level of nested templates because otherwise,
goss will attempt to evaluate the templates that are intended to be
evaluated by docker.

Also, the elements of `stdout` are regexes, so we
adapt it to test for each expected element of the list. Goss will report
which regex did not match the output, so we can use this determine which
plugin are missing. It won't be able to tell if there are other plugins,
but that is very much a feature.
@triarius triarius force-pushed the pdp-695-amazon-linux-2023-for-elastic-ci-stack branch from db766d3 to 3c7c3a5 Compare June 9, 2023 01:59
@triarius triarius merged commit 83d6a06 into v6 Jun 9, 2023
@triarius triarius deleted the pdp-695-amazon-linux-2023-for-elastic-ci-stack branch June 9, 2023 04:35
@triarius triarius mentioned this pull request Jun 18, 2023
Merged
ellsclytn added a commit that referenced this pull request Oct 24, 2024
This was removed in the migration to Amazon Linux 3 with the belief that
there was no use case for the Session Manager Plugin being present on
the agent[1]. It has since been realised that it is quite useful. For
example, the agent may be used to start ECS Tasks which perform work in
other environments/network configurations (such as Database migrations
during deployments).

1. #1122 (comment)
ellsclytn added a commit that referenced this pull request Oct 24, 2024
This was removed in the migration to Amazon Linux 3 with the belief that
there was no use case for the Session Manager Plugin being present on
the agent[1]. It has since been realised that it is quite useful. For
example, the agent may be used to start ECS Tasks which perform work in
other environments/network configurations (such as Database migrations
during deployments).

1. #1122 (comment)
ellsclytn added a commit that referenced this pull request Oct 28, 2024
This was removed in the migration to Amazon Linux 3 with the belief that
there was no use case for the Session Manager Plugin being present on
the agent[1]. It has since been realised that it is quite useful. For
example, the agent may be used to start ECS Tasks which perform work in
other environments/network configurations (such as Database migrations
during deployments).

1. #1122 (comment)
ellsclytn added a commit that referenced this pull request Oct 28, 2024
This was removed in the migration to Amazon Linux 3 with the belief that
there was no use case for the Session Manager Plugin being present on
the agent[1]. It has since been realised that it is quite useful. For
example, the agent may be used to start ECS Tasks which perform work in
other environments/network configurations (such as Database migrations
during deployments).

1. #1122 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FR: Update to Amazon Linux 2022
4 participants