Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PartialAdmission] Job remains suspended when admitted using partial-admission #3140

Closed
mimowo opened this issue Sep 26, 2024 · 2 comments · Fixed by #3152
Closed

[PartialAdmission] Job remains suspended when admitted using partial-admission #3140

mimowo opened this issue Sep 26, 2024 · 2 comments · Fixed by #3152
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mimowo
Copy link
Contributor

mimowo commented Sep 26, 2024

What happened:

If the min-parallelism equals the free resources the Workload gets admitted, but the job remains suspended.

What you expected to happen:

Admit the Workload and Unsuspend the job.

How to reproduce it (as minimally and precisely as possible):

  1. Create the following config
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue"
spec:
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 9
      - name: "memory"
        nominalQuota: 36Gi
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue"
spec:
  clusterQueue: "cluster-queue"
  1. Submit the job
apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-big-
  labels:
    kueue.x-k8s.io/queue-name: user-queue
  annotations:
    kueue.x-k8s.io/job-min-parallelism: "9"
spec:
  parallelism: 10
  completions: 10
  completionMode: Indexed
  suspend: true
  template:
    spec:
      containers:
      - name: job-longrun
        image: python
        command:
        - python3
        - -c
        - |
          import os
          import time
          import sys
          id = int(os.environ.get("JOB_COMPLETION_INDEX"))
          time.sleep(5 + id*5)
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: "1"
            memory: "200Mi"
      restartPolicy: Never

Result, the workload is admitted (resources are reserved), but the Job remains suspended.

Anything else we need to know?:

There are webhook errors:

{"level":"error","ts":"2024-09-26T09:01:35.944600423Z","caller":"jobframework/reconciler.go:461","msg":"Unsuspending job","controller":"job","controllerGroup":"batch","controllerKind":"Job","Job":{"name":"sample-big-2vw2b","namespace":"default"},"namespace":"default","name":"sample-big-2vw2b","reconcileID":"1095dc93-51ac-4267-b317-83ba27e41a23","job":"default/sample-big-2vw2b","gvk":"batch/v1, Kind=Job","error":"admission webhook \"vjob.kb.io\" denied the request: metadata.annotations[kueue.x-k8s.io/job-min-parallelism]: Invalid value: 9: should be between 0 and 8"

...

@mimowo mimowo added the kind/bug Categorizes issue or PR as related to a bug. label Sep 26, 2024
@mimowo mimowo changed the title [PartialAdmission] Job remains suspended if admitted when using partial-admission [PartialAdmission] Job remains suspended when admitted using partial-admission Sep 26, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Sep 26, 2024

/assign
Assigning initially to myself, feel free to ping me on slack if you are interested to take it

@trasc
Copy link
Contributor

trasc commented Sep 27, 2024

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
2 participants