Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: not possible to add custom selectors #261

Closed
vsoch opened this issue Aug 17, 2023 · 12 comments
Closed

bug: not possible to add custom selectors #261

vsoch opened this issue Aug 17, 2023 · 12 comments

Comments

@vsoch
Copy link
Contributor

vsoch commented Aug 17, 2023

Hi! I'm trying to add a custom selector and then pod affinity, specifically my spec looks like this for the JobSet:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: metricset-sample
  namespace: default
spec:
  failurePolicy: {}
  network:
    enableDNSHostnames: false
    subdomain: ms
  replicatedJobs:
  - name: "n"
    replicas: 1
    template:
      metadata:
        name: metricset-sample
        namespace: default
      spec:
        activeDeadlineSeconds: 31500000
        backoffLimit: 100
        completionMode: Indexed
        completions: 1
        parallelism: 1
        selector:
          matchLabels:
            metrics-operator-tenancy: sole-tenancy
        template:
          metadata:
            labels:
              app.kubernetes.io/name: metricset-sample
              cluster-name: metricset-sample
              metrics-operator-tenancy: sole-tenancy
              metricset-name: metricset-sample
              namespace: default
            name: metricset-sample
            namespace: default
          spec:
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                    - key: metrics-operator-tenancy
                      operator: In
                      values:
                      - sole-tenancy
                  topologyKey: kubernetes.io/hostname
            containers:
            - command:
              - /bin/bash
              - /metrics_operator/netmark-launcher.sh
              image: vanessa/netmark:latest
              imagePullPolicy: IfNotPresent
              name: launcher
              resources: {}
              securityContext:
                privileged: false
              stdin: true
              tty: true
              volumeMounts:
              - mountPath: /metrics_operator/
                name: metricset-sample
                readOnly: true
            restartPolicy: OnFailure
            setHostnameAsFQDN: true
            shareProcessNamespace: false
            subdomain: ms
            volumes:
            - configMap:
                items:
                - key: netmark-launcher
                  mode: 511
                  path: netmark-launcher.sh
                - key: netmark-worker
                  mode: 511
                  path: netmark-worker.sh
                name: metricset-sample
              name: metricset-sample
  - name: w
    replicas: 1
    template:
      metadata:
        name: metricset-sample
        namespace: default
      spec:
        activeDeadlineSeconds: 31500000
        backoffLimit: 100
        completionMode: Indexed
        completions: 1
        parallelism: 1
        selector:
          matchLabels:
            metrics-operator-tenancy: sole-tenancy
        template:
          metadata:
            labels:
              app.kubernetes.io/name: metricset-sample
              cluster-name: metricset-sample
              metrics-operator-tenancy: sole-tenancy
              metricset-name: metricset-sample
              namespace: default
            name: metricset-sample
            namespace: default
          spec:
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                    - key: metrics-operator-tenancy
                      operator: In
                      values:
                      - sole-tenancy
                  topologyKey: kubernetes.io/hostname
            containers:
            - command:
              - /bin/bash
              - /metrics_operator/netmark-worker.sh
              image: vanessa/netmark:latest
              imagePullPolicy: IfNotPresent
              name: workers
              resources: {}
              securityContext:
                privileged: false
              stdin: true
              tty: true
              volumeMounts:
              - mountPath: /metrics_operator/
                name: metricset-sample
                readOnly: true
            restartPolicy: OnFailure
            setHostnameAsFQDN: true
            shareProcessNamespace: false
            subdomain: ms
            volumes:
            - configMap:
                items:
                - key: netmark-launcher
                  mode: 511
                  path: netmark-launcher.sh
                - key: netmark-worker
                  mode: 511
                  path: netmark-worker.sh
                name: metricset-sample
              name: metricset-sample
  successPolicy:
    operator: All
    targetReplicatedJobs:
    - "n"
  suspend: false

But the controller always throws up on me. Note that I've tried each of matchLabels and matchExpression.

2023-08-17T19:25:26Z    ERROR   Reconciler error        {"controller": "jobset", "controllerGroup": "jobset.x-k8s.io", "controllerKind": "JobSet", "JobSet": {"name":"metricset-sample","namespace":"default"}, "namespace": "default", "name": "metricset-sample", "reconcileID": "add6cefc-5a66-464c-bf96-ce61816e5470", "error": "Job.batch \"metricset-sample-n-0\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"controller-uid\":\"66ef794d-bf29-459f-ab53-9994c7d7c260\", \"metrics-operator-tenancy\":\"sole-tenancy\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: `selector` not auto-generated"}

I was trying to match the example here https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#namespace-selector although that is for a Deployment and not a JobSet! Should this work here (or am I missing a detail?)

@kannon92
Copy link
Contributor

The Job controller sets selectors for you but there is a way to disable that. Have you tried manualSelector: true?

Reference: https://kubernetes.io/docs/concepts/workloads/controllers/job/#specifying-your-own-pod-selector

@vsoch
Copy link
Contributor Author

vsoch commented Aug 17, 2023

Ah yes I can try that. But I’m worried about not setting some UID akin to what the controller does to maintain the correct grouping. Any reason we can’t just add a selector to it (used for something else)? Why is it throwing the error?

@kannon92
Copy link
Contributor

The error is happening because we generate a selector and we validate if that selector was generated. Since you give a different selector but don't tell it that you passed one in, its in a bad state. Hence the error.

I think if you want to use a selector then you would be bypassing the UID work. We use the selector via UUID by default but since you want to use a different selector you have to override the behavior.

@vsoch
Copy link
Contributor Author

vsoch commented Aug 17, 2023

Maybe this is another way to set anti affinity?

if topologyDomain, ok := js.Annotations[jobset.ExclusiveKey]; ok {

@vsoch
Copy link
Contributor Author

vsoch commented Aug 17, 2023

I'm going to try:

jobspec.Template.ObjectMeta.Annotations = map[string]string{jobset.ExclusiveKey: "kubernetes.io/hostname"}

Not sure if that is an intended use case or documented anywhere, but worth a shot.

@vsoch
Copy link
Contributor Author

vsoch commented Aug 17, 2023

okay I tried that - and this is the result!

spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: jobset.sigs.k8s.io/job-key
            operator: In
            values:
            - 2ec5915d6a9b04c61f8255ee06fbb52ec721772b
        namespaceSelector: {}
        topologyKey: kubernetes.io/hostname
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: jobset.sigs.k8s.io/job-key
            operator: Exists
          - key: jobset.sigs.k8s.io/job-key
            operator: NotIn
            values:
            - 2ec5915d6a9b04c61f8255ee06fbb52ec721772b
        namespaceSelector: {}
        topologyKey: kubernetes.io/hostname

@alculquicondor you had suggested just adding anti affinity for this use case here - it looks like this is adding both. Is this desired (and will it work to ensure 1pod/node (via hostname), or should we be able to explicitly ask for just one?

@vsoch
Copy link
Contributor Author

vsoch commented Aug 17, 2023

@alculquicondor
Copy link

cc @danielvegamyhre

@vsoch
Copy link
Contributor Author

vsoch commented Aug 21, 2023

@danielvegamyhre perhaps there should be a set of two annotations with lower granularity that say to add only the affinity or anti affinity sections?

jobspec.Template.ObjectMeta.Annotations = map[string]string{jobset.ExclusiveKey: "kubernetes.io/hostname"}
jobspec.Template.ObjectMeta.Annotations = map[string]string{jobset.ExclusiveAffinityKey: "kubernetes.io/hostname"}
jobspec.Template.ObjectMeta.Annotations = map[string]string{jobset.ExclusiveAntiAffinityKey: "kubernetes.io/hostname"}

Or something along those lines? If JobSet is going to take control of unique selectors and exposing these (more complex) options through annotations (which is an idea I really like) I think a user would want to have more granularity in their preferences.

@danielvegamyhre
Copy link
Contributor

Sorry for the late reply - can you clarify what the actual vs expected behavior is here?

When using exclusive affinities, the pod affinity/anti-affinity rules are appended to any existing ones, so you should still be able to specify whatever custom selectors you want in your custom pod affinity rules. Is there a bug that is causing them to be overwritten when you attempt to use exclusive affinities as well?

@vsoch
Copy link
Contributor Author

vsoch commented Aug 23, 2023

No worries! We cannot create a custom selector unless we set manual to True (JobSet creates its own unique) and if I do that, I might as well make the affinity I need on my own too. I’m looking for an easier way, and I like the annotation approach, but @alculquicondor advised just to add the anti affinity and this seems to add both no matter what. So the problem is that I want to apply an annotation for just the anti affinity and I don’t want an affinity.

@danielvegamyhre
Copy link
Contributor

No worries! We cannot create a custom selector unless we set manual to True (JobSet creates its own unique) and if I do that, I might as well make the affinity I need on my own too. I’m looking for an easier way, and I like the annotation approach, but @alculquicondor advised just to add the anti affinity and this seems to add both no matter what. So the problem is that I want to apply an annotation for just the anti affinity and I don’t want an affinity.

Hey sorry this slipped through the cracks. We are planning to a KEP for an actual Placement Policy API (#75), and migrate away from the simple annotation based approach (which lacks flexibility, as highlighted in your issue here). So I'm going to close this issue in favor of the umbrella issue for the Placement Policy API, where this use case can be considered while designing the new API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants