-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand aarch64 support to all CI images #2380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested a few of the images on my x86 mac and it works great!
For CI, you can push images to your own dockerhub or GH registry account and change .circleci/config.yml
for testing.
cfa574d
to
c086b03
Compare
Codecov Report
@@ Coverage Diff @@
## master #2380 +/- ##
==========================================
+ Coverage 98.34% 98.36% +0.01%
==========================================
Files 1101 1102 +1
Lines 58940 59222 +282
==========================================
+ Hits 57967 58252 +285
+ Misses 973 970 -3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
77bd165
to
12d7cb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few notes, but this is looking nice!
One thing I would suggest is to avoid a big PR that tries to do everything in one big step. For instance -- why not merge the Dockerfile
changes and then add the rest of the improvements separately?
1291323
to
23e82b5
Compare
Good point. Let's have the updates and cleanups proposed as separate PRs and keep this a minimal drop-in replacement. I do want to move docker compose and CircleCI to use these images ASAP though, so that it's all synchronised: it doesn't make much sense to have these rely on images that come from this repo but the Dockerfiles don't exist anymore (it'd be awkward if a fix, update, or change is suddenly needed). I've added changes to that end in this PR. |
CircleCI failures so far: Ruby 2.1 and Ruby 2.2
JRuby 9.2.8.0
JRuby 9.2.18.0
|
Working from `arm64-darwin` machines becomes an increasingly common thing, and presumably we want to be able to run all Ruby versions on that. Our `docker-compose` relies on images that aren't currently built for `aarch64-linux`, this commit changes that. Special cases: * Removed `protoc` from all `Dockerfile`s since it appears to be unused now * `ruby:2.1` has no aarch64 image published, Rebuilt it from a Debian Stretch base * `ruby:2.2` has an aarch64 image published but is Debian Jessie based, and Jessie had aarch64 upon release but as an "extra" arch, which is not part of LTS, so packages vanished when Debian moved the repos to `archive`, so any `apt-get` irrevocably fails. Rebuilt from a Debian Stretch base * JRuby 9.2 has no aarch64 image published, was based on `openjdk:8-jre`, I rebuilt both from the last `openjdk:8-jre` which now has aarch64 images, these being based on bullseye (there are also Buster-based images upstream if it turns out it matters) * TruffleRuby 21 has no aarch64 image published, and it wasn't an official one anyway. Added TruffleRuby 22 from the official Oracle registry, and removed TruffleRuby 21 The last bit of adjustment is Stretch-based ones (so 2.1, 2.2, 2.3, 2.4) which do build until they fail to run the first aarch64 release of `docker-compose` due to an outdated glibc version. This has been worked around by installing it from `stretch-backports`, which has a 1.21 version close to 1.29 we presumably want. 2.4 had a note about staying on Stretch instead of Buster because of `sequel`, which may not be relevant anymore. It was initially moved to Buster to work around the `docker-compose` failure, but is currently moved back to Stretch using the above backport alternative. In all cases attempts were made to stay as close as possible to the original `Dockerfile`s for each version and their base. When things are made to differ it's because there seems to be no realistic alternative. There are opportunistic moves to streamline and clean all of this but this was kept out of scope of this commit to maximise consistency and ease of reviewing. All images are now building and running for both `x86_64` and `aarch64`.
Also drop arm64 variants
We try to use GHA caching to go faster on subsequent runs.
GHA cache is limited to 10GB, we overshoot that by a lot with Docker layers. Also, while cache appears to store and restore correctly, it seems like it fails to match. Therefore we stop using the GHA cache type and attempt to use images pushed in the registry instead.
67923a3
to
510be4c
Compare
It seems like there's an issue with JRE 8 + Debian bullseye, which is what openjdk:8-jre is based on nowadays. - jruby:9.2.18.0-jre8 was based on openjdk:8-jre-buster, moving it back to that - jruby:9.2.8.0-jre was based on stretch but had no aarch64, so moving it to buster as well Both `rake spec:httprb` and `rake spec:ethon` now pass on JRuby 9.2
MySQL 8.0.4 changed the authentication set at user creation, defaulting to `caching_sha2_password`, and client libs predating 8.0.3, which includes MariaDB, cannot connect. MariaDB finally updated but took some time to catch up, so there's a time gap. Another failure mode is when client libs are built against GnuTLS. MariaDB libs are what is installed by default, so when running against a recently pulled MySQL 8 server, outdated clients will fail to connect. This affects Ruby images in the following way: - 2.1, 2.2, 2.3, 2.4: Authentication plugin 'caching_sha2_password' cannot be loaded - 2.5, 2.6, 2.7: RSA Encryption not supported - caching_sha2_password plugin was built with GnuTLS support This works around it by reverting to `mysql_native_password` as a default. We are warned about this now: > 'default_authentication_plugin' is deprecated and will be removed in a > future release. Please use authentication_policy instead But `authentication_policy` appears to work differently. This will have to do to get us out of the wood.
Ruby 2.1 and 2.2Errors above could not be reproduced locally: I have a feeling they may be due to a bundle mismatch. Indeed for these versions, binary gem builds were cached when installed on Jessie, but these are now Stretch-based. I did find another interesting issue with MySQL 8.0.4 though regarding authentication plugins. JRubyThese crashes appear to echo back to @ivoanjo's jruby/jruby#7033 and I reproduced with:
I too thought they were maybe tied to the JVM:
But that would be in these patch updates, not 8 vs 11... So checking the base userland:
In that issue, @ivoanjo commented:
As it turns out, it may indeed depend not entirely on the JVM version but on the base image from which the JVM image was built. It's hard to know what was exactly the version JRuby 9.2.19.0 (JRE 8) was built at the time. But today it is:
And the following is consistent with @ivoanjo's images:
This may point to jre8 + debian 11 being a problem! So, basing 9.2.8.0 on And basing 9.2.18.0 on
Basing 9.2.8.0 on |
935382b
to
d5caf13
Compare
d5caf13
to
8f88885
Compare
Images are still built on every push, but are not pushed.
I experienced this in another form:
Worked around to confirm my hunch by shooting the cache using a hack in |
0b6dd70
to
99cd2b3
Compare
99cd2b3
to
44188f4
Compare
Interetingly enough, two system test apps are unrelatedly failing because they're based on Jessie, which has a dead GPG key. These images can be reused to fix the corresponding system test ruby apps. |
a908596
to
3cddbe7
Compare
Because CIRCLE_CACHE_VERSION is globally defined in CircleCI, bumping it means starting from a blank universe for all subsequent job runs. If a new PR that breaks binary compatibility ran + saved to cache first then it would produce a matchable key from which other branches would try to build, and fail for older jobs. This also serves as a signal to developers that if they pull new images past a change to this file they should clear their bundle cache volumes and run bundle install and appraisal install again.
3cddbe7
to
ed862b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Thank you for also making it easier to rebuild images in the future!
push: | ||
branches: | ||
- "**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't want to keep push
as one of the triggers, right? I'd think we only want manual (workflow_dispatch
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Loic may have been thinking of having docker cache handle this, but it may get slightly annoying -- e.g. we already have like 80 jobs that happen on push between github actions and circleci and it's getting somewhat confusing to find the ones that fail in the github UI, it was definitely not made to scale in that way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want to keep the push
trigger.
it's getting somewhat confusing to find the ones that fail in the github UI
It's mostly confusing because CircleCI does not show up in the Checks
tab of a PR so we have to check the summary :/ (I just jump to a tab to CircleCI to really check, much easier)
I do wish GitHub would sort jobs by state: (in the order: failed, cancelled, success, running, pending)
That said, since these images are essentially independent of dd-trace-rb and can be reused to great effect, I initially wanted these images to be defined and built outside of this repo, but @delner was not too keen about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more notes, but I see no reason not to get this merged and leave other changes to follow-up PRs.
@@ -0,0 +1 @@ | |||
1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the contents of this file can be anything, what do you think of including an explanation at the top of what this file is for? I'm thinking that after we merge this a file called binary_version
with a 1
sounds kinda mysterious but not very self-explanatory ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the whole file contents is checksum
'd by CircleCI I refrained from adding non-semantic things as any change would also trigger a cache bust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm but right now coming in from the outside it's kinda cryptic what this file does -- you need to "reverse engineer" when it was added and maybe find this discussion :/
image: ivoanjo/docker-library:ddtrace_rb_2_5_9 | ||
image: ghcr.io/datadog/dd-trace-rb/ruby:2.5.9-dd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since the repo name already includes dd-trace-rb
, consider perhaps dropping the -dd
suffix at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix is intentional, I will need to have neutral, "bare" ruby images to base myself upon at DataDog/system-tests-apps-ruby, the -dd
suffix means that it's ruby but with some Datadog-specific additions.
Picture it like this:
ghcr.io/datadog/dd-trace-rb/ruby:2.5.9
should be likeruby:2.5.9
from Docker hub (except more up to date WRT bundler + base Debian).ghcr.io/datadog/dd-trace-rb/ruby:2.5.9-dd
is the above but with all the DataDog stuff added in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough! Consider documenting that somewhere ;)
\ | ||
&& gem update --system "$RUBYGEMS_VERSION" | ||
|
||
# TODO: update only once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my suggested alternative for this comment (that appears in a few places):
# TODO: This is not the version that ends up in the image; at the bottom of the Dockerfile we upgrade it to a later one. To be fixed in the future
I think it's important to spell out that we update more than once to different versions :)
# To build AND push multi-arch images (but DON'T DO THAT IN GENERAL, unless e.g CI is down): | ||
$ docker buildx build . --platform linux/x86_64,linux/aarch64 -f Dockerfile-3.1.1 -t ghcr.io/datadog/dd-trace-rb/ruby:3.1.1-dd --push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since people tend to copy-paste commands and not always edit them, I suggest removing the --push
from the actual command example, and instead just stating on the text "If you ever need to push (....) add --push to the above command".
push: | ||
branches: | ||
- "**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Loic may have been thinking of having docker cache handle this, but it may get slightly annoying -- e.g. we already have like 80 jobs that happen on push between github actions and circleci and it's getting somewhat confusing to find the ones that fail in the github UI, it was definitely not made to scale in that way...
What does this PR do?
Add first-class support for development on
aarch64-linux
.Motivation
Working from
arm64-darwin
machines becomes an increasingly common thing, and presumably we want to be able to run all Ruby versions on that.Our
docker-compose
relies on images that aren't currently built foraarch64-linux
, this commit changes that.Additional Notes
Special cases:
protoc
from allDockerfile
s since it appears to be unused nowruby:2.1
has no aarch64 image published, Rebuilt it from a Debian Stretch baseruby:2.2
has an aarch64 image published but is Debian Jessie based, and Jessie had aarch64 upon release but as an "extra" arch, which is not part of LTS, so packages vanished when Debian moved the repos toarchive
, so anyapt-get
irrevocably fails. Rebuilt from a Debian Stretch baseopenjdk:8-jre
, I rebuilt both from the lastopenjdk:8-jre
which now has aarch64 images, these being based on bullseye (there are also Buster-based images upstream if it turns out it matters)The last bit of adjustment is Stretch-based ones (so 2.1, 2.2, 2.3, 2.4) which do build until they fail to run the first aarch64 release of
docker-compose
due to an outdated glibc version. This has been worked around by installing it fromstretch-backports
, which has a 1.21 version close to 1.29 we presumably want.2.4 had a note about staying on Stretch instead of Buster because of
sequel
, which may not be relevant anymore. It was initially moved to Buster to work around thedocker-compose
failure, but is currently moved back to Stretch using the above backport alternative.In all cases attempts were made to stay as close as possible to the original
Dockerfile
s for each version and their base. When things are made to differ it's because there seems to be no realistic alternative. There are opportunistic moves to streamline and clean all of this but this was kept out of scope of this commit to maximise consistency and ease of reviewing.All images are now building and running for both
x86_64
andaarch64
.How to test the change?
.circleci/images/primary/README.md
https://github.com/tonistiigi/binfmt
), and specifying the platform to docker with--platform linux/aarch64
A subsequent commit will also add a GitHub Workflow that will build these images in CI, with an option to publish them on the GitHub Container Registry at
ghcr.io
for everyone (including CircleCI) to use.