-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The operation was canceled. #2468
Comments
This has recently stated happening at an onerous frequency in our Bazel monorepo build action. Self-hosted runner. Happy to provide any troubleshooting info I can. Summary:
These events are happening despite us not clicking cancel on the action. |
Hey all, @magnetnation are you running hosted runner or using our image for self-hosted runner? |
Hi, We are using hosted runners, and here is an example of failed run. |
Also see this a lot in various PowerShell commands that run on GitHub runners. Seems to be very unpredictable, but definitely seeing this a lot across multiple jobs and commands. Have not seen this previously to this extent. |
I'm seeing the same thing here with a fairly large CMake C++ project build. It's behaving as if it has no swap and runs out of memory. I've had it happen in a WSL docker container before. I ended up needing to allocate more RAM to the virtual machine. I have no idea how that would translate to this situation. |
Any update on this issue? |
Same here, still seeing these issues @ruvceskistefan |
Any updated on this? I've this issue for a week now and nothing seems to be solving it |
We have also been facing this issue very frequently for weeks now even with a 8-core larger runner. How can I upload a shutdown log before the runner dies? It could have given an insight to me actually. |
@ruvceskistefan I think the tag "awaiting-customer-response" is not appropriate any more, could you remove it? I think I have the same issue:
Could this be an OOM killer? Update: Feature request for better OOM-reporting: https://github.com/orgs/community/discussions/50571 |
This issue started happening after I transferred a repository from one organization to another. It was working when the repo was on the other organization and now all jobs are being cancelled, no exception. There's no timeout set on the workflow file and the cancellation can happen anytime between ~30s and ~1m. Both organizations have exactly the same configuration and paid plan. Other repositories' workflows in the organization run just fine, including other transferred repositories. When I re-ran the failed jobs with debug log enabled, the only error I get on the Runner logs is this:
|
Having similar issue with Runner version: '2.303.0', Runner: 'ubuntu-latest-16-cores'. |
Having similar issue here: https://github.com/biaslab/ReactiveMP.jl/actions/runs/4576751595/jobs/8081938027 |
This is happening to us as well on a monorepo (Turborepo) in the Lint job (running with ESLint). |
We fixed it by limiting workers on a 8 Core machine with:
|
@nuhcoka I'm confused because my CMake build should only run a single thread unless the -j flag is given (when build system is make or ninja). That's why I assumed it wasn't the issue. Maybe the default setting changed. |
@mathemaphysics but Gradle by default uses all cores no?. The official description of the flag:
|
We are having a similar issue as well where we have a process running for a really long time (like 6 hours) before it fails for no reason. See here |
I think your case may be different from others here. It is related to the 6-hour job execution limit instead. From the docs,
|
We're constantly having issue with this. it's unusable. until it's fixed we have to run it manually every time. link to action https://github.com/papaya-insurtech/berry/actions/runs/5131060157/jobs/9230656685 |
We were able to fix it on our side, we had a mistake in our codebase that caused enormous amount of unnecessary allocations. We no longer have any issues after the mistake has been fixed. It looks like the "The operation was cancelled" error was definitely an OOM error, but getting a better error message would be nicer. |
Hey @bvdmitri, what was the fix exactly? Maybe it can shed some light on ours, too 🙂 |
@nuhkoca |
For me roughly every third github action failed. Increasing the swap space to 10GB on the - name: Set Swap Space
uses: pierotofy/set-swap-space@master
with:
swap-size-gb: 10
|
We also switched over to a new next gen garbage collector and it fixed most OOM problems. instead of
This flag is needed to activate the CMS Collector in the first place. By default, HotSpot uses the Throughput Collector instead.
When the CMS collector is used, this flag activates the parallel execution of young generation GCs using multiple threads. It may seem surprising at first that we cannot simply reuse the flag -XX:+UseParallelGC known from the Throughput Collector, because conceptually the young generation GC algorithms used are the same. However, since the interplay between the young generation GC algorithm and the old generation GC algorithm is different with the CMS collector, there are two different implementations of young generation GC and thus two different flags. https://www.codecentric.de/wissens-hub/blog/useful-jvm-flags-part-7-cms-collector |
Current runners got newer Ubuntu, but this breaks our piuparts run. Program lsof is running with 100% CPU for ~15 minutes and runner cancel that job after that. * actions/runner#2468 * actions/runner-images#7188
miri runs are failing lately. Checking if this is related to bigger RAM consumption of miri in latest releases. See [1] for the proposed solution. --- [1] actions/runner#2468 (comment)
…#2468#issuecomment-1651313943), and flake8-black
for me nothing works. I couldn't even find an image, or an example that works. Tried in a matrix, java and SDK combinations, all images, all platforms, all the different things. This is so frustrating. Why won't you people give us an example that just works ? Every single time I have to build a CI from scratch every few years, I have to go through this god damn rabbit hole. What is wrong with you people https://github.com/kaanx022/kaan/actions/runs/9042209780/job/24848448703 |
ok on ubuntu some of them are actually running with the perfect setup, the big matrix table is here: https://github.com/kaanx022/kaan/actions/runs/9042456492/job/24848994152 But we shouldn't have to run a matrix of ALL POSSIBLE COMBINATIONS just to figure out the one that works. |
Based on: actions/runner#2468 Test failures appear to occur due to an out-of-memory error. This attempts to increase the swap-space size on Ubuntu.
This seems to still be happening, e.g. with |
Describe the bug
Since last week actions in our different repositories started to fail with similar error:
#[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Workflow should be running without cancellations.
Runner Version and Platform
Image: ubuntu-22.04
Version: 20230219.1
Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20230219.1/images/linux/Ubuntu2204-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20230219.1
OS of the machine running the runner?
Linux
What's not working?
Workflow is failing without any reason stated in the logs, apart from it has been cancelled.
Job Log Output
2023-03-01T06:49:56.3959504Z > @mgnation/mgdata@1.4.212 _bundle
2023-03-01T06:49:56.3960408Z > node build/bundle.js
2023-03-01T06:49:56.3960710Z
2023-03-01T06:49:56.8884234Z �[32mBundling has started�[0m
2023-03-01T06:49:56.9282005Z �[34mCopy results completed�[0m
2023-03-01T06:49:56.9290747Z �[34mAllow publish completed�[0m
2023-03-01T06:49:56.9295155Z build: 39.141ms
2023-03-01T06:49:56.9302039Z �[32mBundling has finished�[0m
2023-03-01T06:49:57.2702808Z
2023-03-01T06:49:57.2708891Z > @mgnation/mgdata@1.4.212 test
2023-03-01T06:49:57.2709823Z > node ./node_modules/nyc/bin/nyc.js node ./tmp/spec/runner.js
2023-03-01T06:49:57.2710190Z
2023-03-01T06:50:56.1865343Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2023-03-01T06:50:56.2969758Z ##[debug]Re-evaluate condition on job cancellation for step: 'npm install, build, and test'.
2023-03-01T06:50:56.2973354Z ##[debug]Skip Re-evaluate condition on runner shutdown.
2023-03-01T06:50:56.5597757Z ----------|---------|----------|---------|---------|-------------------
2023-03-01T06:50:56.5612354Z File | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s
2023-03-01T06:50:56.5616821Z ----------|---------|----------|---------|---------|-------------------
2023-03-01T06:50:56.5618821Z All files | 0 | 0 | 0 | 0 |
2023-03-01T06:50:56.5620688Z ----------|---------|----------|---------|---------|-------------------
2023-03-01T06:50:56.6038836Z ##[error]The operation was canceled.
2023-03-01T06:50:56.6052060Z ##[debug]System.OperationCanceledException: The operation was canceled.
2023-03-01T06:50:56.6060563Z ##[debug] at System.Threading.CancellationToken.ThrowOperationCanceledException()
2023-03-01T06:50:56.6063113Z ##[debug] at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel
1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)2023-03-01T06:50:56.6065859Z ##[debug] at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel
1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)2023-03-01T06:50:56.6069900Z ##[debug] at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(IExecutionContext context, String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, String standardInInput, CancellationToken cancellationToken)
2023-03-01T06:50:56.6078953Z ##[debug] at GitHub.Runner.Worker.Handlers.ScriptHandler.RunAsync(ActionRunStage stage)
2023-03-01T06:50:56.6079410Z ##[debug] at GitHub.Runner.Worker.ActionRunner.RunAsync()
2023-03-01T06:50:56.6086736Z ##[debug] at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
2023-03-01T06:50:56.6102555Z ##[debug]Finishing: npm install, build, and test
2023-03-01T06:50:56.6376954Z ##[debug]Evaluating condition for step: 'Post Use Node.js 16.x'
2023-03-01T06:50:56.6379186Z ##[debug]Skip evaluate condition on runner shutdown.
2023-03-01T06:50:56.6392406Z ##[debug]Evaluating condition for step: 'Post Run actions/checkout@v3'
2023-03-01T06:50:56.6392834Z ##[debug]Skip evaluate condition on runner shutdown.
2023-03-01T06:50:56.6821240Z ##[debug]Starting: Complete job
2023-03-01T06:50:56.6833228Z Uploading runner diagnostic logs
2023-03-01T06:50:56.7043348Z ##[debug]Starting diagnostic file upload.
2023-03-01T06:50:56.7046324Z ##[debug]Setting up diagnostic log folders.
2023-03-01T06:50:56.7306655Z ##[debug]Creating diagnostic log files folder.
2023-03-01T06:50:56.7419356Z ##[debug]Copying 1 worker diagnostic logs.
2023-03-01T06:50:56.7470424Z ##[debug]Copying 1 runner diagnostic logs.
2023-03-01T06:50:56.7539403Z ##[debug]Zipping diagnostic files.
2023-03-01T06:50:56.8067735Z ##[debug]Uploading diagnostic metadata file.
2023-03-01T06:50:56.8441811Z ##[debug]Diagnostic file upload complete.
2023-03-01T06:50:56.8445882Z Completed runner diagnostic log upload
2023-03-01T06:50:56.8450728Z Cleaning up orphan processes
2023-03-01T06:50:56.9100720Z ##[debug]Finishing: Complete job
2023-03-01T06:50:56.9301578Z ##[debug]Finishing: build (16.x)
The text was updated successfully, but these errors were encountered: