Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable: new release on 2020-08-11 (32.20200726.3.1) #158

Closed
34 tasks done
lucab opened this issue Jul 28, 2020 · 23 comments
Closed
34 tasks done

stable: new release on 2020-08-11 (32.20200726.3.1) #158

lucab opened this issue Jul 28, 2020 · 23 comments
Assignees

Comments

@lucab
Copy link
Contributor

lucab commented Jul 28, 2020

First, verify that you meet all the prerequisites

Name this issue stable: new release on YYYY-MM-DD with today's date. Once the pipeline spits out the new version ID, you can append it to the title e.g. (31.20191117.3.0).

Pre-release

Promote testing changes to stable

From the checkout for fedora-coreos-config (replace upstream below with
whichever remote name tracks coreos/):

  • git fetch upstream
  • git checkout stable
  • git reset --hard upstream/stable
  • /path/to/fedora-coreos-releng-automation/scripts/promote-config.sh testing
  • Sanity check promotion with git show
  • Open PR against the stable branch on https://github.com/coreos/fedora-coreos-config
  • Post a link to the PR as a comment to this issue
  • Ideally have at least one other person check it and approve
  • Once CI has passed, merge it

Build

  • Start a pipeline build (select stable, leave all other defaults)
  • Post a link to the job as a comment to this issue
  • Wait for the job to finish

Sanity-check the build

Using the the build browser for the stable stream:

  • Verify that the parent commit and version match the previous stable release (in the future, we'll want to integrate this check in the release job)
  • Check kola AWS run to make sure it didn't fail
  • Check kola GCP run to make sure it didn't fail

⚠️ Release ⚠️

IMPORTANT: this is the point of no return here. Once the OSTree commit is
imported into the unified repo, any machine that manually runs rpm-ostree upgrade will have the new update.

Run the release job

  • Run the release job, filling in for parameters stable and the new version ID
  • Post a link to the job as a comment to this issue
  • Wait for job to finish
  • Verify that the OSTree commit and its signature are present and valid by booting a VM at the previous release (e.g. cosa run --qemu-image /path/to/previous.qcow2) and verifying that rpm-ostree upgrade works and rpm-ostree status shows a valid signature.

At this point, Cincinnati will see the new release on its next refresh and create a corresponding node in the graph without edges pointing to it yet.

Refresh metadata (stream and updates)

From a checkout of this repo:

  • Update stream metadata, by running:
fedora-coreos-stream-generator -releases=https://fcos-builds.s3.amazonaws.com/prod/streams/stable/releases.json  -output-file=streams/stable.json -pretty-print
  • Update the updates metadata, editing updates/stable.json:
    • Find the last-known-good release (whose rollout has a start_percentage of 1.0) and set its version to the most recent completed rollout
    • Delete releases with completed rollouts
    • Add a new rollout:
      • Set version field to the new version
      • Set start_epoch field to a future timestamp for the rollout start (e.g. date -d '2019/09/10 14:30UTC' +%s)
      • Set start_percentage field to 0.0
      • Set duration_minutes field to a reasonable rollout window (e.g. 2880 for 48h)
    • Update the last-modified field to current time (e.g. date -u +%Y-%m-%dT%H:%M:%SZ)

A reviewer can validate the start_epoch time by running date -u -d @<EPOCH>. An example of encoding and decoding in one step: date -d '2019/09/10 14:30UTC' +%s | xargs -I{} date -u -d @{}.

  • Commit the changes and open a PR against the repo.
  • Post a link to the PR as a comment to this issue
  • Wait for the PR to be approved.
  • Once approved, merge it and verify that the sync-stream-metadata job syncs the contents to S3
  • Verify the new version shows up on the download page
  • Verify the incoming edges are showing up in the update graph:
curl -H 'Accept: application/json' 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=x86_64&stream=stable&rollout_wariness=0'

NOTE: In the future, most of these steps will be automated.

Housekeeping

  • If one doesn't already exist, open an issue in this repo with the approximate date in the title of the next release in this stream.
@bgilbert
Copy link
Contributor

bgilbert commented Jul 28, 2020

This release adds the PXE rootfs image.

@sinnykumari
Copy link
Contributor

Promotion PR - coreos/fedora-coreos-config#560

@sinnykumari
Copy link
Contributor

@sinnykumari
Copy link
Contributor

sinnykumari commented Aug 10, 2020

pipeline build failed:

Testing scenarios: [pxe-install iso-offline-install]
+ set -xeuo pipefail
+ kola testiso -SP --qemu-native-4k --output-dir tmp/kola-metal4k
Testing scenarios: [iso-offline-install]
error: The subcommand 'embed-kargs' wasn't recognized
	Did you mean 'embed'?

If you believe you received this message in error, try re-running with 'coreos-installer iso -- embed-kargs'

USAGE:
    coreos-installer iso
    coreos-installer iso <SUBCOMMAND>

Do we need coreos-installer 0.5.0-1.fc32 in stable as well?

@sinnykumari
Copy link
Contributor

After looking at successful pipeline run, above errors looks like non-fatal. There is also timeout happening as well as just after that.
Adding timeout error which is just after above error message:

For more information try --help
2020-08-10T15:21:02Z platform: running coreos-installer iso embed-kargs: exit status 1
Error: scenario iso-offline-install: timed out after 10m0s
2020-08-10T15:28:52Z cli: scenario iso-offline-install: timed out after 10m0s
[Pipeline] }
Failed in branch metal4k
Error: scenario iso-offline-install: timed out after 10m0s
2020-08-10T15:31:03Z cli: scenario iso-offline-install: timed out after 10m0s
[Pipeline] }
Failed in branch metal

@jlebon
Copy link
Member

jlebon commented Aug 10, 2020

It's due to coreos/coreos-assembler#1643 (aka https://github.com/coreos/fedora-coreos-streams/issues/64).

I've pushed a branch with cosa just before that PR. Once it's built, you can rebuild stable with this parameter:

COREOS_ASSEMBLER_IMAGE=quay.io/coreos-assember/coreos-assembler:fcos-32.20200726.3.0

Edit: for posterity, the error is in the console logs of those tests:

error: ../../grub-core/fs/fshelp.c:257:file `/images/vmlinuz' not found.
error: ../../grub-core/loader/i386/efi/linux.c:205:you need to load the kernel

stable doesn't have the corresponding patch for the artifact move: coreos/fedora-coreos-config@346f770

@sinnykumari
Copy link
Contributor

@sinnykumari
Copy link
Contributor

My bad, forgot to override COREOS_ASSEMBLER_IMAGE parameter during pipeline build, aborted it and started with right params - https://jenkins-fedora-coreos.apps.ci.centos.org/job/fedora-coreos/job/fedora-coreos-fedora-coreos-pipeline/13573/

@sinnykumari
Copy link
Contributor

Re-triggered build job with typo fix in build parameter COREOS_ASSEMBLER_IMAGE quay.io/coreos-assembler/coreos-assembler:fcos-32.20200726.3.0 https://jenkins-fedora-coreos.apps.ci.centos.org/job/fedora-coreos/job/fedora-coreos-fedora-coreos-pipeline/13574/

@sinnykumari
Copy link
Contributor

signing failed:

+ cosa sign robosignatory --s3 fcos-builds/prod/streams/stable/builds --extra-fedmsg-keys stream=stable --images --gpgkeypath /etc/pki/rpm-gpg --fedmsg-conf /etc/fedora-messaging-cfg/fedmsg.toml
Successfully started consumer thread
Sending artifacts-sign request for build 32.20200726.3.0
Waiting for a response to the sent request
Traceback (most recent call last):
  File "/usr/lib/coreos-assembler/cmd-sign", line 277, in <module>
    sys.exit(main())
  File "/usr/lib/coreos-assembler/cmd-sign", line 42, in main
    args.func(args)
  File "/usr/lib/coreos-assembler/cmd-sign", line 100, in cmd_robosignatory
    robosign_images(args, s3, build, gpgkey)
  File "/usr/lib/coreos-assembler/cmd-sign", line 214, in robosign_images
    response = send_request_and_wait_for_response(
  File "/usr/lib/coreos-assembler/cosalib/fedora_messaging_request.py", line 63, in send_request_and_wait_for_response
    return wait_for_response(cond, request_timeout)
  File "/usr/lib/coreos-assembler/cosalib/fedora_messaging_request.py", line 103, in wait_for_response
    raise Exception("Timed out waiting for request response message")
Exception: Timed out waiting for request response message

Started new pipeline build with force - https://jenkins-fedora-coreos.apps.ci.centos.org/job/fedora-coreos/job/fedora-coreos-fedora-coreos-pipeline/13578/

@sinnykumari
Copy link
Contributor

Failed during ostree signing.

+ cosa sign robosignatory --s3 fcos-builds/prod/streams/stable/builds --extra-fedmsg-keys stream=stable --ostree --gpgkeypath /etc/pki/rpm-gpg --fedmsg-conf /etc/fedora-messaging-cfg/fedmsg.toml
Uploading s3://fcos-builds/prod/streams/stable/builds/tmp/ostree-commit-object
Successfully started consumer thread
Sending ostree-sign request for build 32.20200726.3.1
Waiting for a response to the sent request
Traceback (most recent call last):
  File "/usr/lib/coreos-assembler/cmd-sign", line 277, in <module>
    sys.exit(main())
  File "/usr/lib/coreos-assembler/cmd-sign", line 42, in main
    args.func(args)
  File "/usr/lib/coreos-assembler/cmd-sign", line 97, in cmd_robosignatory
    robosign_ostree(args, s3, build, gpgkey)
  File "/usr/lib/coreos-assembler/cmd-sign", line 121, in robosign_ostree
    response = send_request_and_wait_for_response(
  File "/usr/lib/coreos-assembler/cosalib/fedora_messaging_request.py", line 63, in send_request_and_wait_for_response
    return wait_for_response(cond, request_timeout)
  File "/usr/lib/coreos-assembler/cosalib/fedora_messaging_request.py", line 103, in wait_for_response
    raise Exception("Timed out waiting for request response message")
Exception: Timed out waiting for request response message

Looks like this is not a glitch, I am going to stop retrying build unless we know the cause.

@dustymabe
Copy link
Member

I spun a testing-devel and it got past signing now so hopefully we're back to a good state. Want to try to kick off a new round of builds ?

@sinnykumari
Copy link
Contributor

@dustymabe
Copy link
Member

started another pipeline build - https://jenkins-fedora-coreos.apps.ci.centos.org/job/fedora-coreos/job/fedora-coreos-fedora-coreos-pipeline/13581/

and... I killed it 😢 - we found an issue with the rootfs artifacts that needs to be fixed. coreos-installer can't verify the rootfs:

$ coreos-installer download -s testing -f pxe
gpg: Signature made Mon 10 Aug 2020 12:46:39 PM EDT
gpg:                using RSA key 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
gpg: Good signature from "Fedora (32) <fedora-32-primary@fedoraproject.org>" [ultimate]
> Read initramfs 693.1 MiB/693.1 MiB (100%)   
./fedora-coreos-32.20200809.2.0-live-initramfs.x86_64.img
gpg: Signature made Mon 10 Aug 2020 12:46:08 PM EDT
gpg:                using RSA key 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
gpg: Good signature from "Fedora (32) <fedora-32-primary@fedoraproject.org>" [ultimate]
> Read kernel 10.3 MiB/10.3 MiB (100%)   
./fedora-coreos-32.20200809.2.0-live-kernel-x86_64
gpg: Signature made Mon 10 Aug 2020 12:46:45 PM EDT
gpg:                using RSA key 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
gpg: Good signature from "Fedora (32) <fedora-32-primary@fedoraproject.org>" [ultimate]
Error: decoding first MiB of image 
Caused by: failed to fill whole buffer

So we're working on fixing that first.

dustymabe added a commit to dustymabe/fedora-coreos-streams that referenced this issue Aug 11, 2020
We found an issue with the PXE rootfs artifact where coreos-installer
can't verify it [1]. We're going to do a new set of builds so let's stop
the next/testing rollouts of 32.20200809.1.0 and 32.20200809.2.0 so we
don't update users two times in as many days.

[1] coreos#158 (comment)
dustymabe added a commit that referenced this issue Aug 11, 2020
We found an issue with the PXE rootfs artifact where coreos-installer
can't verify it [1]. We're going to do a new set of builds so let's stop
the next/testing rollouts of 32.20200809.1.0 and 32.20200809.2.0 so we
don't update users two times in as many days.

[1] #158 (comment)
@sinnykumari
Copy link
Contributor

another pipeline build with cosa having fixes for rootfs artifact - https://jenkins-fedora-coreos.apps.ci.centos.org/job/fedora-coreos/job/fedora-coreos-fedora-coreos-pipeline/13592/

@sinnykumari
Copy link
Contributor

@sinnykumari sinnykumari changed the title stable: new release on 2020-08-11 stable: new release on 2020-08-11(32.20200726.3.1 ) Aug 12, 2020
@bgilbert bgilbert changed the title stable: new release on 2020-08-11(32.20200726.3.1 ) stable: new release on 2020-08-11 (32.20200726.3.1) Aug 12, 2020
@sinnykumari
Copy link
Contributor

OSTree commit is present and has valid signature.

$ rpm-ostree status
State: idle
Deployments:
● ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 32.20200726.3.1 (2020-08-12T05:29:32Z)
                    Commit: 2579b41aa614c3a40b9e24ff0b9dd288f99222dc3ed3a527ef0d8e8667196ff5
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0

  ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 32.20200715.3.0 (2020-07-27T11:36:29Z)
                    Commit: a3b08ee51b1d950afd9d0d73f32d5424ad52c7703a6b5830e0dc11c3a682d869
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0

@sinnykumari
Copy link
Contributor

update rollout - #167

@sinnykumari
Copy link
Contributor

Incoming edges looks good in update graph

@sinnykumari
Copy link
Contributor

rollout is scheduled to start at Wed 12 Aug 2020 03:30:00 PM UTC for duration of 48 hours

@sinnykumari
Copy link
Contributor

rollout window is over, all nodes should be updated to latest stable stream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants