Executing the custom container implementation failed due to Node out of resources

### Checks

- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided

### Controller Version

0.6.1

### Deployment Method

Helm

### Checks

- [X] This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions/actions-runner-controller/discussions)).
- [X] I've read the [Changelog](https://github.com/actions/actions-runner-controller/blob/master/docs/gha-runner-scale-set-controller/README.md#changelog) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

### To Reproduce

1. Create a runner in kubernetes mode by the book.

<details>
  <summary>Runner template</summary>

  ```yaml
template:
  metadata:
      labels:
        app: myarc
  spec:
    initContainers:
    - name: init-k8s-volume-permissions
      image: ghcr.io/actions/actions-runner:latest
      command: ["/bin/sh", "-c"]
      args:
        - |
          sudo chown -R 1001:123 /home/runner/_work
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "true"
        - name: ACTIONS_RUNNER_CONTAINER_HOOKS
          value: /home/runner/k8s/index.js
        - name: ACTIONS_RUNNER_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
          value: /home/runner/pod-templates/default.yaml
      securityContext:
        runAsUser: 1001
        runAsGroup: 123
      resources:
        requests:
          cpu: 500m
          memory: 512Mi
        limits:
          memory: 512Mi
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: pod-templates
          mountPath: /home/runner/pod-templates
          readOnly: true
    volumes:
      - name: pod-templates
        configMap:
          name: pod-templates
  ```
  
</details>

Setting `minRunners: 0` and `maxRunners: 5` also help highlight this issue in my example.

2. Make sure you are using an autoscaling nodepool (that can go to very few or even zero nodes)

3. Make sure the runner has large memory/cpu requests, for the workflow pod (using [PodTemplate](https://github.com/actions/actions-runner-controller/issues/2895#issuecomment-1770483529))


<details>
  <summary>PodTemplate ConfigMap</summary>
  
  ```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: pod-templates
  namespace: gh-arc
data:
  default.yaml: |
    ---
    apiVersion: v1
    kind: PodTemplate
    metadata:
      name: runner-pod-template
      namespace: gh-arc
      labels:
        app: runner-pod-template
    spec:
      securityContext:
        runAsUser: 1001
        runAsGroup: 123
      containers:
      - name: $job
        resources:
          requests:
            cpu: 1000m
            memory: 8Gi
          limits:
            memory: 8Gi

  ```
  
</details>


4. Run a simple pipeline that spawn multiple jobs hence multiple runners.
_Here I'm using `strategy:` to make sure they all spawn nearly at the same time), and `containers:` to make sure it creates a `-workflow` pod._

<details>
  <summary>Actions CI</summary>
  
  ```yaml
name: GitHub Actions Test
run-name: Test
on: [push]
jobs:
  foo:
    runs-on: myarc
    container: debian
    strategy:
      matrix:
        package:
          - 'common'
          - 'utils'
          - 'ui'
          - 'billing'
    steps:
      - run: echo "Running for ${{ matrix.package }}"
      - name: Check out repository code
        uses: actions/checkout@v3
      - run: sleep 300
  ```
  
</details>

5. Observe your CI jobs failing


### Describe the bug

The rs-controller receives the pipeline request and scale up the runnerset to 4 runners.

Each runner pods get scheduled to one node. Since they only require 500Mi memory each, they all fit in that 16Go memory node.

After initializing, they each spawn a `-workflow` pod next to them. Kubernetes now tries to schedule 4 pod that each requires 8Go memory on that same single node. It fails due to OOM.

On Github Actions you can see your job failed at the "Initialize containers" step, with :
```
Run '/home/runner/k8s/index.js'
Warning: Skipping name override: name can't be overwritten
Error: Error: failed to create job pod: HttpError: HTTP request failed
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
```


### Describe the expected behavior

The `-workflow` pod should be scheduled by kube-scheduler on a different node.

I think the [container-hook](https://github.com/actions/runner-container-hooks) does not rely on kube-scheduler and all workflow pods are necessarily spawn next to its runner pod. Maybe due to a constraint I haven't seen (volumes maybe?)

I've seen [a](https://github.com/actions/actions-runner-controller/issues/2895#issuecomment-1770565385) [lot](https://github.com/actions/actions-runner-controller/discussions/2946#discussioncomment-7162453) [of](https://github.com/actions/actions-runner-controller/discussions/2594#discussioncomment-5948099) [people](https://github.com/actions/runner-container-hooks/issues/100) setting resources requests to the container but I'm failing to see how that can solve my issue since the job is actually run on the workflow pod.

### Additional Context

_All configs are in the reproduction steps_

We are mostly trying to reduce our cost on kubernetes nodes and don't find acceptable to have a few large nodes idle, in case a CI pipeline get triggers. Hence the requirement to scale the nodepool to zero and set runner minimum to zero too.

We also provide multiple runner by size, so our developpers can juste pick the desirable runner size with tags like `memory-xl`, `cpu-s`, etc.


### Controller Logs

```shell
https://gist.github.com/guillaumevillemont/9d6bb8cd62ef5c1dd5b78f30b225a182
```


### Runner Pod Logs

```shell
https://gist.github.com/guillaumevillemont/9d6bb8cd62ef5c1dd5b78f30b225a182
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executing the custom container implementation failed due to Node out of resources #112

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Executing the custom container implementation failed due to Node out of resources #112

Description

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions