Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colima start no longer works on macos-12 runner 20230812.3 #8104

Closed
2 of 10 tasks
joffrey-bion opened this issue Aug 16, 2023 · 24 comments
Closed
2 of 10 tasks

colima start no longer works on macos-12 runner 20230812.3 #8104

joffrey-bion opened this issue Aug 16, 2023 · 24 comments
Assignees
Labels
Area: Containers awaiting-deployment Code complete; awaiting deployment and/or deployment in progress bug report external OS: macOS

Comments

@joffrey-bion
Copy link

Description

My docker setup action looks like this:

      # Docker is not installed on GitHub's MacOS hosted workers due to licensing issues
      - name: Setup docker (missing on MacOS)
        if: runner.os == 'macos'
        run: |
          brew install docker
          colima start
          
          # For testcontainers to find the Colima socket
          # https://github.com/abiosoft/colima/blob/main/docs/FAQ.md#cannot-connect-to-the-docker-daemon-at-unixvarrundockersock-is-the-docker-daemon-running
          sudo ln -sf $HOME/.colima/default/docker.sock /var/run/docker.sock

This all worked correctly 5 hours ago:
https://github.com/joffrey-bion/chrome-devtools-kotlin/actions/runs/5882244388/job/15952382762

But right now the colima start command fails:
https://github.com/joffrey-bion/chrome-devtools-kotlin/actions/runs/5884334898/job/15960009886

time="2023-08-16T23:10:01Z" level=info msg="starting colima"
time="2023-08-16T23:10:01Z" level=info msg="runtime: docker"
time="2023-08-16T23:10:01Z" level=info msg="preparing network ..." context=vm
time="2023-08-16T23:10:01Z" level=info msg="creating and starting ..." context=vm
time="2023-08-16T23:10:01Z" level=info msg="Terminal is not available, proceeding without opening an editor"
time="2023-08-16T23:10:01Z" level=info msg="Attempting to download the image" arch=x86_64 digest="sha512:f761b807fe9ba345968df72c07f8c5abcae0c4a44976fe5595c0ff748ef693841221a70e663986c700b027cea32b7cac24d5490d4c721593c39f2b8840c362a2" location="https://github.com/abiosoft/alpine-lima/releases/download/colima-v0.5.5/alpine-lima-clm-3.18.0-x86_64.iso"
Downloading the image (alpine-lima-clm-3.18.0-x86_64.iso)

315.00 MiB / 315.00 MiB (100.00%) ? p/stime="2023-08-16T23:10:06Z" level=info msg="Downloaded the image from \"https://github.com/abiosoft/alpine-lima/releases/download/colima-v0.5.5/alpine-lima-clm-3.18.0-x86_64.iso\""
time="2023-08-16T23:10:10Z" level=info msg="[hostagent] Starting QEMU (hint: to watch the boot progress, see \"/Users/runner/.lima/colima/serial*.log\")"
time="2023-08-16T23:10:10Z" level=info msg="SSH Local Port: 49179"
time="2023-08-16T23:10:10Z" level=info msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
time="2023-08-16T23:10:10Z" level=info msg="[hostagent] Driver stopped due to error: \"exit status 255\""
time="2023-08-16T23:10:10Z" level=info msg="[hostagent] Shutting down the host agent"
time="2023-08-16T23:10:10Z" level=warning msg="[hostagent] failed to exit SSH master"
time="2023-08-16T23:10:10Z" level=info msg="[hostagent] Shutting down QEMU with ACPI"
time="2023-08-16T23:10:10Z" level=warning msg="[hostagent] failed to open the QMP socket \"/Users/runner/.lima/colima/qmp.sock\", forcibly killing QEMU"
time="2023-08-16T23:10:10Z" level=info msg="[hostagent] QEMU has already exited"
time="2023-08-16T23:10:10Z" level=fatal msg="exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see \"/Users/runner/.lima/colima/ha.stderr.log\")"
time="2023-08-16T23:10:10Z" level=fatal msg="error starting vm: error at 'creating and starting': exit status 1"

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • macOS 13
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Runner image: macos-12
Runner version: 20230812.3

https://github.com/joffrey-bion/chrome-devtools-kotlin/actions/runs/5884334898/job/15960009886

Is it regression?

Yes, it worked in 20230803.1 (earlier today)

Expected behavior

colima start succeeds and I can use the docker CLI.

Actual behavior

colima start fails with [hostagent] failed to open the QMP socket \"/Users/runner/.lima/colima/qmp.sock\", forcibly killing QEMU

Repro steps

Use a workflow with an action like this to setup docker and colima:

      # Docker is not installed on GitHub's MacOS hosted workers due to licensing issues
      - name: Setup docker (missing on MacOS)
        if: runner.os == 'macos'
        run: |
          brew install docker
          colima start
          
          # For testcontainers to find the Colima socket
          # https://github.com/abiosoft/colima/blob/main/docs/FAQ.md#cannot-connect-to-the-docker-daemon-at-unixvarrundockersock-is-the-docker-daemon-running
          sudo ln -sf $HOME/.colima/default/docker.sock /var/run/docker.sock
@joffrey-bion
Copy link
Author

joffrey-bion commented Aug 16, 2023

I could find this related issue on colima side: abiosoft/colima#614

It seems to be related to running this on M1 - did the runner's architecture change?

@shamil-mubarakshin
Copy link
Contributor

Hello @joffrey-bion, thanks for reporting.
Architecture didn't change. We will investigate the issue

@shamil-mubarakshin shamil-mubarakshin added OS: macOS Area: Containers investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Aug 17, 2023
@shamil-mubarakshin shamil-mubarakshin self-assigned this Aug 17, 2023
@joffrey-bion
Copy link
Author

joffrey-bion commented Aug 17, 2023

@shamil-mubarakshin FYI today the action is working fine again. I'm not sure what exactly happened. Maybe the specific runner the action was running on was in a broken state or something. Feel free to close the issue as it doesn't seem to be related to the actual runner image, but rather the runner's state.

@joffrey-bion
Copy link
Author

joffrey-bion commented Aug 17, 2023

Oh wait nevermind that, the new successful run is actually on runner image version 20230803.1, so maybe the version was reverted? In this case it does seem that the issue is very much related to the runner version.

@joffrey-bion
Copy link
Author

joffrey-bion commented Aug 18, 2023

Is there any way for me as a user to control the version of the runner image? Today the runners are back to 20230812.3 and my builds started failing again

@shamil-mubarakshin
Copy link
Contributor

@joffrey-bion, there is no way to select previous image version.
Could you try adding brew reinstall qemu to update qemu with latest patch, or add following to rollback to qemu 8.0.3:

https://raw.githubusercontent.com/Homebrew/homebrew-core/dc0669eca9479e9eeb495397ba3a7480aaa45c2e/Formula/qemu.rb
brew install ./qemu.rb

@joffrey-bion
Copy link
Author

joffrey-bion commented Aug 18, 2023

Thanks for the suggestion, I tried brew reinstall qemu and I get the same error:
https://github.com/joffrey-bion/chrome-devtools-kotlin/actions/runs/5901047187/job/16006458286

Note that during the reinstall of qemu, it doesn't seem to download a later version ("already downloaded"):

==> Fetching qemu
==> Downloading https://ghcr.io/v2/homebrew/core/qemu/manifests/8.0.4
Already downloaded: /Users/runner/Library/Caches/Homebrew/downloads/0e3980f7747a02bd917e21c8a967f1f1755f3a042cbd2ab81bee7767ecb31acb--qemu-8.0.4.bottle_manifest.json
==> Downloading https://ghcr.io/v2/homebrew/core/qemu/blobs/sha256:60a1e81578c0c1c43bd66663922c03cfcd8686bf54a3c486f1073f49309bc4c4
Already downloaded: /Users/runner/Library/Caches/Homebrew/downloads/8548411e2c9fea6836096e6d3338db1c17ef2ba143eba7617aaadb0609abc285--qemu--8.0.4.monterey.bottle.tar.gz
==> Reinstalling qemu 
==> Pouring qemu--8.0.4.monterey.bottle.tar.gz
🍺  /usr/local/Cellar/qemu/8.0.4: [16](https://github.com/joffrey-bion/chrome-devtools-kotlin/actions/runs/5901047187/job/16006458286#step:4:17)2 files, 5[22](https://github.com/joffrey-bion/chrome-devtools-kotlin/actions/runs/5901047187/job/16006458286#step:4:23).6MB

I will try the other suggestion.

@joffrey-bion
Copy link
Author

I was trying the other suggestion, and realized sometimes the job is running on 20230803.1, so I get an error that version 8.0.3 of qemu is already installed 😆 How are the runner image versions selected? Why does it alternate between the 2?

@shamil-mubarakshin
Copy link
Contributor

Apologies for inconvenience, image rollout is in progress and runner's image version might be inconsistent

@joffrey-bion
Copy link
Author

No problem, I understand.

I have prepended the following to my action:

# Workaround for https://github.com/actions/runner-images/issues/8104
brew remove --ignore-dependencies qemu
curl -o ./qemu.rb https://raw.githubusercontent.com/Homebrew/homebrew-core/dc0669eca9479e9eeb495397ba3a7480aaa45c2e/Formula/qemu.rb
brew install ./qemu.rb

At the moment I only saw runs on 20230803.1, so I cannot confirm that it's a valid workaround, but I can at least say that it doesn't break on the old runner image 😄

I'll report back when I get to run this on a new runner image.

@shamil-mubarakshin
Copy link
Contributor

Your downgrade steps seem to work. I also tried below for latest qemu 8.0.4, but it wasn't that consistent:

export HOMEBREW_NO_INSTALLED_DEPENDENTS_CHECK=1
brew update
brew reinstall qemu

There are few threads about qemu and colima abiosoft/colima#777, Homebrew/homebrew-core#139409. Fix on runner image should arrive with next release cycle

@shamil-mubarakshin shamil-mubarakshin added awaiting-deployment Code complete; awaiting deployment and/or deployment in progress external and removed investigate Collect additional information, like space on disk, other tool incompatibilities etc. labels Aug 18, 2023
joffrey-bion added a commit to joffrey-bion/krossbow that referenced this issue Aug 18, 2023
@joffrey-bion
Copy link
Author

Thanks, indeed I could verify that it works with the workaround even on runner 20230812.3.

Fix on runner image should arrive with next release cycle

Awesome, thanks for the heads up. Do you have an estimate on when the next release will likely happen?

@shamil-mubarakshin
Copy link
Contributor

@joffrey-bion, macos11 and macos12 rollouts with 8.0.4-1 qemu bottle are in progress and should take around a week to complete

@joffrey-bion
Copy link
Author

Thank you so much for your help and for the information 🙏 Have a nice day!

@AkihiroSuda
Copy link

Workaround

cat >entitlements.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.security.hypervisor</key>
    <true/>
</dict>
</plist>
EOF

codesign --sign - --entitlements entitlements.xml --force /usr/local/bin/qemu-system-x86_64

The proper fix is proposed here:

@nok
Copy link

nok commented Aug 26, 2023

I had the same issue on macOS Big Sur (11.7.8) and the workaround by @AkihiroSuda worked. Thanks!

@rfay
Copy link

rfay commented Aug 27, 2023

After all the fixes, this still isn't working on most runners, I assume that just takes time.

@joffrey-bion
Copy link
Author

joffrey-bion commented Aug 28, 2023

@rfay could you please share the runner image version that was used in the builds that fail for you?

Have you tried my workaround above?

# Workaround for https://github.com/actions/runner-images/issues/8104
brew remove --ignore-dependencies qemu
curl -o ./qemu.rb https://raw.githubusercontent.com/Homebrew/homebrew-core/dc0669eca9479e9eeb495397ba3a7480aaa45c2e/Formula/qemu.rb
brew install ./qemu.rb

# Then the usual
brew install docker
colima start

@rfay
Copy link

rfay commented Aug 28, 2023

Hi and thanks @joffrey-bion -

2.308.0 - See https://github.com/ddev/ddev/actions/runs/5979159856/job/16272595813

Runner Image: 20230818.2

No, I haven't tried the alternate qemu yet, as I've already done two PRs that temporarily fixed this, but it keeps breaking again. I'm waiting for it to stabilize. Following all the issues.

@rfay
Copy link

rfay commented Aug 30, 2023

I did

with your workaround and it's working.

I don't understand why the upstream fixes to qemu in homebrew didn't solve this. I know you'll keep us informed about updates to the runner image.

@mikhailkoliada
Copy link
Contributor

Shall be fixed in the image as we rolled out the newer one to all the customers, let us know if something does not work for you.

@rfay
Copy link

rfay commented Aug 30, 2023

Still working on this. Just had complete failure on a runner with 20230825.1

All colima start warn

/Users/runner/.colima/_wrapper/4e1b408f843d1c63afbbdcf80c40e4c88d33509f/bin/qemu-system-x86_64" is not properly signed with the "com.apple.security.hypervisor" entitlement" error="failed to run [codesign --verify /Users/runner/.colima/_wrapper/4e1b408f843d1c63afbbdcf80c40e4c88d33509f/bin/qemu-system-x86_64]: exit status 1 (out="/Users/runner/.colima/_wrapper/4e1b408f843d1c63afbbdcf80c40e4c88d33509f/bin/qemu-system-x86_64: code object is not signed at all\nIn architecture: x86_64

Example: https://github.com/ddev/ddev/actions/runs/6030400257/job/16362122432?pr=5309#step:9:95

I seem to be able to get this to start successfully now by adding a brew install --reinstall qemu

@strophy
Copy link

strophy commented Sep 7, 2023

This still fails for me with following runner version:

Current runner version: '2.308.0'
Operating System
  macOS
  12.6.7
  21G651
Runner Image
  Image: macos-12
  Version: 20230825.1
  Included Software: https://github.com/actions/runner-images/blob/macOS-12/20230825.1/images/macos/macos-12-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/macOS-12%2F20230825.1
Runner Image Provisioner
  2.0.266.1

@shamil-mubarakshin
Copy link
Contributor

It seems there were recent updates to brew qemu, which is the reason of qemu v8.1.0 getting into the 20230825.1 image. You can refer to Homebrew/homebrew-core#140244 and brew repository for more info.
From images side, the ongoing 20230901.1 release has following qemu installed, which hopefully resolves issues:

    vsphere-clone: ==> Fetching qemu
    vsphere-clone: ==> Downloading https://ghcr.io/v2/homebrew/core/qemu/manifests/8.1.0_1-1
    vsphere-clone: ==> Downloading https://ghcr.io/v2/homebrew/core/qemu/blobs/sha256:246862506a64cbe52bce23f20ca8b1b7474618a00e2114c2ab9c06066ac58dde

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Containers awaiting-deployment Code complete; awaiting deployment and/or deployment in progress bug report external OS: macOS
Projects
None yet
Development

No branches or pull requests

7 participants