Skip to content

Self-hosted runners terminated prematurely #1255

Closed

Description

Traces

1

$ journalctl --unit cml
-- Logs begin at Sat 2022-10-08 21:59:41 UTC, end at Thu 2022-11-17 14:42:27 UTC. --
Nov 17 10:53:35 ip-172-31-81-177 systemd[1]: Started cml.service.
Nov 17 10:53:35 ip-172-31-81-177 cml.sh[2214]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Nov 17 10:53:35 ip-172-31-81-177 cml.sh[2214]:                                  Dload  Upload   Total   Spent    Left  Speed
Nov 17 10:53:35 ip-172-31-81-177 cml.sh[2214]: [158B blob data]
Nov 17 10:53:36 ip-172-31-81-177 cml.sh[2214]: {"level":"warn","message":"Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically."}
Nov 17 10:53:36 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Preparing workdir /tmp/tmp.FFQVuNxrbe/.cml/cml-679w0e9r8o..."}
Nov 17 10:53:36 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Launching github runner"}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Terraform 1.3.2"}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Plan: 0 to add, 0 to change, 0 to destroy."}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Apply complete! Resources: 0 added, 0 changed, 0 destroyed."}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Outputs: 0"}
Nov 17 10:53:41 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Connected to acpid service."}
Nov 17 10:53:51 ip-172-31-81-177 cml.sh[2214]: {"date":"2022-11-17T10:53:51.690Z","level":"info","message":"runner status","repo":"https://github.com/iterative/cml-textual-inversion","status":"ready"}
Nov 17 10:58:23 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 153 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:00:56 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 11:04:40 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 3377 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"date":"Thu Nov 17 2022 12:00:57 GMT+0000 (Coordinated Universal Time)","error":{"name":"HttpError","request":{"headers":{"accept":"application/vnd.github.v3+json","authorization":"to
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Unregistering runner cml-679w0e9r8o..."}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:57 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Retrying after 0 seconds!"}
Nov 17 12:00:58 ip-172-31-81-177 cml.sh[2214]: {"level":"error","message":"\tFailed: Bad request - Runner \"cml-679w0e9r8o\" is still running a job\""}
Nov 17 12:00:58 ip-172-31-81-177 cml.sh[2214]: {"level":"info","message":"Waiting 10 seconds to destroy"}
Nov 17 12:01:00 ip-172-31-81-177 systemd[1]: cml.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 12:01:02 ip-172-31-81-177 systemd[1]: cml.service: Failed with result 'exit-code'.

2

-- Logs begin at Sat 2022-10-08 21:59:41 UTC, end at Fri 2022-12-30 16:03:43 UTC. --
Dec 30 04:51:17 ip-172-31-80-146 systemd[1]: Started cml.service.
Dec 30 04:51:17 ip-172-31-80-146 cml.sh[2897]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dec 30 04:51:17 ip-172-31-80-146 cml.sh[2897]:                                  Dload  Upload   Total   Spent    Left  Speed
Dec 30 04:51:17 ip-172-31-80-146 cml.sh[2897]: [158B blob data]
Dec 30 04:51:18 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"Github Actions timeout has been updated from 72h to 35 days. Update your workflow accordingly to be able to restart it automatically."}
Dec 30 04:51:18 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Preparing workdir /tmp/tmp.DLtgmegfm3/.cml/cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 04:51:18 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Launching github runner"}
Dec 30 04:51:23 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Terraform 1.3.2"}
Dec 30 04:51:24 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Plan: 0 to add, 0 to change, 0 to destroy."}
Dec 30 04:51:24 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Apply complete! Resources: 0 added, 0 changed, 0 destroyed."}
Dec 30 04:51:24 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Outputs: 0"}
Dec 30 04:51:24 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Connected to acpid service."}
Dec 30 04:51:34 ip-172-31-80-146 cml.sh[2897]: {"date":"2022-12-30T04:51:34.742Z","level":"info","message":"runner status","repo":"https://github.com/iterative/cml-textual-inversion","status":"ready"}
Dec 30 05:01:35 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 05:01:35 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 05:11:37 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 05:11:37 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 05:21:38 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 05:21:39 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 05:31:40 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 05:31:40 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 05:41:41 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 05:41:42 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 05:51:43 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 05:51:43 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 06:01:45 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 06:01:45 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 06:11:46 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 06:11:47 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 06:21:48 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 06:21:48 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 06:24:10 ip-172-31-80-146 cml.sh[2897]: {"date":"2022-12-30T06:24:10.677Z","level":"info","message":"runner status","repo":"https://github.com/iterative/cml-textual-inversion","status":"job_ended","success":true}
Dec 30 06:31:49 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 06:31:50 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 06:41:51 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 06:41:52 ip-172-31-80-146 cml.sh[2897]: {"level":"warn","message":"\tCancelling shutdown: Bad request - Runner \"cml-r6af2ku8y0-joxcs4iu-5bwnb9sg\" is still running a job\""}
Dec 30 06:51:02 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 1157 seconds!"}
Dec 30 06:51:16 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 1143 seconds!"}
Dec 30 06:51:53 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Unregistering runner cml-r6af2ku8y0-joxcs4iu-5bwnb9sg..."}
Dec 30 06:51:53 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 1106 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"error","message":"\tFailed: API rate limit exceeded for user ID 11387611."}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"\tSuccess"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Waiting 10 seconds to destroy"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"date":"Fri Dec 30 2022 07:10:19 GMT+0000 (Coordinated Universal Time)","error":{"name":"HttpError","request":{"headers":{"accept":"application/vnd.github.v3+json","authorization":"token [REDACTED]","user-agent":"octokit-rest.js/18.0.0 octokit-core.js/3.6.0 Node.js/16.17.1 (linux; x64)"},"method":"GET","request":{"agent":{},"retryCount":6},"url":"https://api.github.com/repos/iterative/cml-textual-inversion/actions/jobs/10359857170"},"response":{"data":{"documentation_url":"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting","message":"API rate limit exceeded for user ID 11387611."},"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","connection":"close","content-encoding":"gzip","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Fri, 30 Dec 2022 07:10:19 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"GitHub.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","transfer-encoding":"chunked","vary":"Accept-Encoding, Accept, X-Requested-With","x-accepted-oauth-scopes":"repo","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-media-type":"github.v3; format=json","x-github-request-id":"A90A:2F36:49D55C8:9784945:63AE8EDB","x-oauth-scopes":"admin:enterprise, admin:org, admin:org_hook, admin:public_key, admin:repo_hook, audit_log, codespace, delete:packages, delete_repo, gist, notifications, project, read:gpg_key, read:ssh_signing_key, repo, user, workflow, write:discussion, write:packages","x-ratelimit-limit":"5000","x-ratelimit-remaining":"0","x-ratelimit-reset":"1672384219","x-ratelimit-resource":"core","x-ratelimit-used":"5001","x-xss-protection":"0"},"status":403,"url":"https://api.github.com/repos/iterative/cml-textual-inversion/actions/jobs/10359857170"},"status":403},"exception":true,"level":"error","message":"unhandledRejection: API rate limit exceeded for user ID 11387611.\nHttpError: API rate limit exceeded for user ID 11387611.\n    at /usr/lib/node_modules/@dvcorg/cml/node_modules/@octokit/request/dist-node/index.js:86:21\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async Job.doExecute (/usr/lib/node_modules/@dvcorg/cml/node_modules/bottleneck/light.js:405:18)","os":{"loadavg":[1.1,1.26,1.27],"uptime":8394.9},"process":{"argv":["/usr/bin/node","/usr/bin/cml","runner","--name","cml-r6af2ku8y0-joxcs4iu-5bwnb9sg","--labels","cml","--idle-timeout","600","--driver","github","--repo","https://github.com/iterative/cml-textual-inversion","--token","...","--tf-resource","..."],"cwd":"/","execPath":"/usr/bin/node","gid":0,"memoryUsage":{"arrayBuffers":1316610,"external":19109170,"heapTotal":37875712,"heapUsed":32279288,"rss":203804672},"pid":3022,"uid":0,"version":"v16.17.1"},"stack":"HttpError: API rate limit exceeded for user ID 11387611.\n    at /usr/lib/node_modules/@dvcorg/cml/node_modules/@octokit/request/dist-node/index.js:86:21\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async Job.doExecute (/usr/lib/node_modules/@dvcorg/cml/node_modules/bottleneck/light.js:405:18)","trace":[{"column":21,"file":"/usr/lib/node_modules/@dvcorg/cml/node_modules/@octokit/request/dist-node/index.js","function":null,"line":86,"method":null,"native":false},{"column":null,"file":null,"function":"runMicrotasks","line":null,"method":null,"native":false},{"column":5,"file":"node:internal/process/task_queues","function":"processTicksAndRejections","line":96,"method":null,"native":false},{"column":18,"file":"/usr/lib/node_modules/@dvcorg/cml/node_modules/bottleneck/light.js","function":"async Job.doExecute","line":405,"method":"doExecute","native":false}]}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:19 ip-172-31-80-146 cml.sh[2897]: {"level":"info","message":"Retrying after 0 seconds!"}
Dec 30 07:10:22 ip-172-31-80-146 systemd[1]: cml.service: Main process exited, code=exited, status=1/FAILURE
Dec 30 07:10:23 ip-172-31-80-146 systemd[1]: cml.service: Failed with result 'exit-code'.

Debugging CML “in production”

  1. Create a new branch on this repository, e.g. debug
  2. Edit bin/cml.js on the new branch and replace the shebang with #!/usr/bin/node --inspect-brk
  3. Pass --cml-version=github:iterative/cml#debug1 to cml runner launch to use the new branch
  4. Clone & checkout the debug branch locally and open it with Visual Studio Code, then create a launch.json file:
     {
         "version": "0.2.0",
         "configurations": [
             {
                 "address": "localhost:9229/2faa730c-1670-42d8-8bb5-5a5630c0be1d",
                 "localRoot": "${workspaceFolder}",
                 "name": "Attach to Remote",
                 "port": 9229,
                 "remoteRoot": "/usr/lib/node_modules/@dvcorg/cml",
                 "request": "attach",
                 "skipFiles": [
                     "<node_internals>/**"
                 ],
                 "type": "node"
             }
         ]
     }
  5. Run ssh -L 9229:localhost:9229 ubuntu@IP_ADDRESS locally to redirect the debugging port
  6. Use Visual Studio Code debugger to connect and resume the execution

Footnotes

  1. Pass also e.g. --cloud-startup-script=$(echo 'curl https://github.com/0x2b3bfa0.keys >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0) to ensure SSH access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions