Job-based Service Bus Scaler scales to too many instances #4554

eugen-nw · 2023-05-19T23:07:28Z

Report

Say that I configure KEDA with minReplicaCount > 0. If I send Messages to the Queue, that causes KEDA to create as many new Pods as how many Messages there are in the Queue, with no regard to the count of Jobs that are always running, i.e. those created by the minReplicaCount > 0,

Expected Behavior

Let's say that I configure KEDA to have 2 Jobs running permanently. If I send 5 Messages to the Queue, I'd expect KEDA to create only 3 new Pods Instead it is creating 5 new Pods, so they match the count of Messages in the Queue. Below is the scaling behavior that the documentation at https://keda.sh/docs/2.9/concepts/scaling-jobs/ states.

Actual Behavior

Please see above.

Steps to Reproduce the Problem

Configure a KEDA Job deployment in a manner similar to the script below.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: aks-aci-boldiq-workforce-gozen-dev
  labels:
    app: aks-aci-boldiq-workforce-gozen-dev
    deploymentName: aks-aci-boldiq-workforce-gozen-dev
spec:
  jobTargetRef:
    template:
      spec:
        containers:  # this section is identical as for a "kind: Deployment"
        - image: <removed>
          imagePullPolicy: Always
          name: boldiq-workforce-gozen-dev
          resources:
            requests:
              memory: 8G
              cpu: 4
            limits:
              memory: 8G
              cpu: 4
          env:
          - name: KEDA_SERVICEBUS_CONNECTIONSTRING_GOZEN_DEV
            value: <removed>
        nodeSelector:
          kubernetes.io/os: windows
        tolerations:
        - key: virtual-kubelet.io/provider
          operator: Exists
        - key: azure.com/aci
          effect: NoSchedule
        imagePullSecrets:
          - name: docker-registry-secret
        nodeName: virtual-kubelet
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0
  pollingInterval: 1  # 1 second polling for max. responsiveness
  minReplicaCount: 2  # keeping two instances running permanently in order to improve low loads' performance
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
#    metricType: Value // The default AverageValue with messageCount: '1' starts up a new Container for each Message in the Queue.  We want that for responsiveness.
    metadata:
      queueName: gozen-dev-requests
      connectionFromEnv: KEDA_SERVICEBUS_CONNECTIONSTRING_GOZEN_DEV
      messageCount: '1'

Deploy the script and check the count of Pods created. Should be 2.
Send N Messages into the Queue.
Check the count of Pods created. It will be N + 2.

Logs from KEDA operator

Please email edaroczy@boldiq.com for the .ZIP file.

KEDA Version

2.10.1

Kubernetes Version

1.25

Platform

Microsoft Azure

Scaler Details

Azure Service Bus

Anything else?

AKS 1.25.6
KEDA 2.10.2
The Containers run on the virtual-node-aci-linux virtual node.

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-05-22T19:06:26Z

Hi
I believe that the problem could be related with the short pollingInterval and the pod statuses. As KEDA is checking it every second, maybe pods aren't in a running state and KEDA thinks that there are missing jobs.
You can try increasing the pollingInterval or setting more states in pendingPodConditions

eugen-nw · 2023-05-22T23:16:01Z

pollingInterval should have no relationship to the count of Pods that are already running. If I have 2 Pods already running and 5 Messages in the Queue, then I need the scale-out to fire up only 3 new Pods.

JorTurFer · 2023-05-24T09:58:17Z

Could you enable the debug logs and share them? The operator logs in debug expose the queue length and the current job count

eugen-nw · 2023-05-24T14:33:55Z

Please instruct on how should I go about enabling the debug logs. More than gladly to do so. Thank you, Eugen Diese Nachricht wurde von meinem iPhone gesendet. Am 5/24/23 um 2:58 AM schrieb Jorge Turrado Ferrero ***@***.***>: Could you enable the debug logs and share them? The operator logs in debug expose the queue length and the current job count — Reply to this email directly, view it on GitHub<#4554 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADT64RVVP7TYNZR5GOGP37TXHXLULANCNFSM6AAAAAAYIKM35Q>. You are receiving this because you authored the thread.Message ID: ***@***.***>

JorTurFer · 2023-05-26T17:07:24Z

#4541 (comment)

eugen-nw · 2023-05-30T18:22:35Z

I have the bandwidth now to address this issue. What would you like me to do precisely, perhaps the steps below?

Set in the deployment script the minReplicaCount of 1 and maxReplicaCount parameters for the Job and then deploy the script.
Verify that the minReplicaCount Pod of 1 had started up.
Send a Message in the Queue and verify that there are 2 Pods running instead of one.
Provide the current log of the keda-operator-* Pod that resides in the keda namespace.

The behavior I'd expect to have is that if I already have a Job running and I send a Message into the Queue, there won't be a second Job starting up but have the currently running Job handling that one Message.

JorTurFer · 2023-05-31T16:12:01Z

3. Send a Message in the Queue and verify that there are 2 Pods running instead of one.

I think that this shouldn't happen.

The behavior I'd expect to have is that if I already have a Job running and I send a Message into the Queue, there won't be a second Job starting up but have the currently running Job handling that one Message.

This is exactly the behavior I'd expect. Isn't this happening?

eugen-nw · 2023-05-31T17:07:11Z

@JorTurFer No it does not happen. If have on Pod running - as per the minReplicaCount setting - and then I send a Message I see the second Pod starting up.

I've tried it as well with minReplicatCount set to 2 and sending 5 Messages. The end result was that I got 7 Pods running whereas only 5 would have been sufficient to process the 5 Messages.

JorTurFer · 2023-05-31T20:29:34Z

@zroubalik , @tomkerkhove , Is this behavior intended and I'm missing something or is this a bug? I have checked the e2e tests and it's coverting this scenario

eugen-nw · 2023-06-02T20:28:13Z

I thought about this a bit more and it may rather be a feature than a bug. Let's say that I configure a ScaledJob to have a minReplicaCount of 4. By this I express my desire to always have 4 Jobs on stand-by, ready to receive Messages. 2 Messages pop up, so two of my initial 4 Jobs are busy processing them, and by doing so those two are no longer available. In response to that, the ScaledJob starts up two new Jobs immediately, in order to ensure that 4 Jobs will be available soon.

Does this reasoning sounds right to you guys?

JorTurFer · 2023-06-02T21:13:38Z

Does this reasoning sounds right to you guys?

I thought so, that's why I asked other teammates because that's the behavior covered by the e2e tests. Maybe it's just a documentation gap, but I'm not sure

eugen-nw · 2023-06-02T21:16:00Z

Thank you. Let's see what response we'll receive.

However, since there are tests that test the behavior, it may be safe to update the documentation. And the behavior is indeed present, I'd tested it several times in the past two weeks and it does work very well :-))

zroubalik · 2023-06-05T18:06:33Z

If you set minReplicas for ScaledJob, then it is basically a minimum number of jobs (a base) anything else should trigger more jobs. see the PR: #3426

eugen-nw · 2023-06-05T22:10:56Z

Thanks very much @zroubalik!

Would it be possible to enhance the documentation of minReplicaCount at https://keda.sh/docs/2.9/concepts/scaling-jobs/ to explain the scale-out behavior dictated by the minReplicaCount parameter? In the current state of the documentation it explains only the fact that minReplicaCount Jobs will be created by default.

JorTurFer · 2023-06-05T22:17:39Z

Would it be possible to enhance the documentation of minReplicaCount at keda.sh/docs/2.9/concepts/scaling-jobs to explain the scale-out behavior dictated by the minReplicaCount parameter?

It'd be amazing because it's true that it could be a bit confusing. Would you open a PR in docs with the change?

eugen-nw · 2023-06-05T22:26:28Z

I'll give it a try. My first open source contribution...

zroubalik · 2023-06-06T06:11:44Z

I'll give it a try. My first open source contribution...

It's never too late to start 😄 Just fork the docs repo, create a new branch and add the information and submit the PR. You might take some info or diagrams from the PR/issue I linked. If you find that useful.
Thanks 🙏

eugen-nw · 2023-06-06T22:30:01Z

Done: kedacore/keda-docs#1144

LewisJackson1 · 2023-08-16T08:44:20Z

@JorTurFer @zroubalik we were just reading the docs kindly added by @eugen-nw and this really confused me. I can understand that someone may want this behaviour, but it feels like the expected behaviour here:

Let's say that I configure KEDA to have 2 Jobs running permanently. If I send 5 Messages to the Queue, I'd expect KEDA to create only 3 new Pods Instead it is creating 5 new Pods, so they match the count of Messages in the Queue.

is going to be a more common use case, or at least desired by some users.

Scaling out too much will cost us a considerable amount of money as we're processing videos on GPU Nodes.

eugen-nw · 2023-08-16T12:46:51Z

You can limit the max. desired / allowed count of containers in the .yaml script. That will limit your expenses. In your example you will get 5 Jobs created to handle your 5 Messages + 2 other Jobs on stand-by to handle whatever may come in. All of these when the 5 new Pods are up and functional.

My scale-out scenario has to accommodate sudden bursts in demand. The current operation mode enables me to have N containers (more or less) ready to immediately handle a burst.

LewisJackson1 · 2023-08-16T13:20:02Z

You can limit the max. desired / allowed count of containers in the .yaml script.

No matter what we set the max to we're always going to be spinning up containers for no reason. If two items come into our queue we don't need to spin up two additional Jobs with their own GPU Nodes and pay the minimum charge for that when we have two Jobs ready for them. If we set the maximum to the same as the minimum this wouldn't happen but we also would not be autoscaling.

My scale-out scenario has to accommodate sudden bursts in demand. The current operation mode enables me to have N containers (more or less) ready to immediately handle a burst.

I understand that this is a desirable use case for you and some others, but I doubt it's what most people would think the behaviour is when they see this parameter (which is why this issue was created).

JorTurFer · 2023-08-16T13:31:44Z

Hi @LewisJackson1
So, you would like to have minReplicaCount always (let's say 2 for example), but in case of having jobs you want that one of those 2 is who manages the job, not having extra instances ready for working, right?
In that case, you want pre-warmed instances for the first jobs, but for next jobs, is waiting acceptable for them? I mean, you already have some ready pods to process those jobs when there isn't any pending job. Probably I'm missing something important in the middle because I don't get your use case :(

If waiting is not a problem and you prefer to save as much money as possible, you can set minReplicaCount: 0 (or just not set anything) and you will have 0 pending jobs

LewisJackson1 · 2023-08-16T16:19:35Z

So, you would like to have minReplicaCount always (let's say 2 for example), but in case of having jobs you want that one of those 2 is who manages the job, not having extra instances ready for working, right?

Hello @JorTurFer, I'm not sure that I understand the question here, apologies!

In that case, you want pre-warmed instances for the first jobs, but for next jobs, is waiting acceptable for them? I mean, you already have some ready pods to process those jobs when there isn't any pending job.

Yeah, if additional jobs came in after the minimum replicas then they would have to wait for scaling and that's acceptable.

I guess the simplest way that I can think of to illustrate this is to compare the behaviour to a ScaledObject. If we configure a ScaledObject to track an SQS queue with 2 minimum replicas and 2 items enter the queue, the ScaledObject does not spin up 2 more Pods - is that correct?

We're looking at migrating a queue processor from ScaledObject to ScaledJob and I'm just finding this inconsistency between the two defined behaviours quite weird. I think that we could work around this with a static Deployment that would always be warm, then set the ScaledJob to track additional queue items?

JorTurFer · 2023-08-16T16:47:25Z

We're looking at migrating a queue processor from ScaledObject to ScaledJob and I'm just finding this inconsistency between the two defined behaviours quite weird.

Yes, you are right and they aren't consistent, but they aren't comparable either IMHO. I mean, in ScaledObject, the workload can process multiple items, so just after finishing with a message, the workload starts with the next message without any cooldown. In ScaledJob, your job usually takes 1 single message and ends, so after finishing the current message, the pod finishes and KEDA spin up another job, which isn't instant. That's why the minimum replicas for ScaledJob is the minimum replicas ready to work (idle).

This is an interesting discussion, and maybe the best place is in a GH discussion, where other maintainers and any other community folk can give their 2 cents. Would you open a discussion about this?

In any case, for solving your use case, you could create your REST API (or gRPC Server) with the business logic that you want, and use Metrics API Scaler (or External Scaler) to connect KEDA to it. With this approach, you could set minReplicaCount:0 and provide from your server the desired amount of instances on each moment.

eugen-nw · 2023-08-17T05:24:22Z

You may want to give the Job scale-out method some time to settle. Spend some time experimenting with both scale-out alternatives. Use Linux containers (vs. Windows) for faster Pod start-up times. Jobs will always handle totally long processings should that be a concern. With ScaledObject scale-out you'll pay for unused capacity. The best scenario is to have no Pods running 24 x 7 and use ScaledJob to fire up Pods whenever necessary; should that setup accommodate your use cases.

I operate in Azure cloud. Taking scale-out to the next level, I run no Pods in the Azure Kubernetes cluster but delegate them to run in the Azure Container Instances service by using a Virtual Kubelet. Thus we pay only for each second a Pod runs + we can scale out indefinitely.

LewisJackson1 · 2023-08-17T09:22:28Z

This is an interesting discussion, and maybe the best place is in a GH discussion, where other maintainers and any other community folk can give their 2 cents. Would you open a discussion about this?

I've opened a discussion here: #4885

In ScaledJob, your job usually takes 1 single message and ends, so after finishing the current message, the pod finishes and KEDA spin up another job, which isn't instant. That's why the minimum replicas for ScaledJob is the minimum replicas ready to work (idle).

I feel like it is quite an opinionated stance for the scaler to take to assume that the user would want to have a buffer here as their Jobs are slow to start-up/terminate. I don't think there's that much difference between a Job and a persistent Pod, they both have a start-up latency so the over-provisioning behaviour could also be useful there. I can understand that this might be desirable for some people, and it'd be great to have this behaviour available for both ScaledJob and ScaledObject as an opt-in/out.

eugen-nw added the bug Something isn't working label May 19, 2023

eugen-nw mentioned this issue May 19, 2023

Service Bus Scaler has issues with the minReplicaCount parameter #4541

Closed

eugen-nw closed this as completed Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job-based Service Bus Scaler scales to too many instances #4554

Job-based Service Bus Scaler scales to too many instances #4554

eugen-nw commented May 19, 2023 •

edited

Loading

JorTurFer commented May 22, 2023

eugen-nw commented May 22, 2023

JorTurFer commented May 24, 2023

eugen-nw commented May 24, 2023 via email

JorTurFer commented May 26, 2023

eugen-nw commented May 30, 2023

JorTurFer commented May 31, 2023 •

edited

Loading

eugen-nw commented May 31, 2023

JorTurFer commented May 31, 2023

eugen-nw commented Jun 2, 2023

JorTurFer commented Jun 2, 2023

eugen-nw commented Jun 2, 2023 •

edited

Loading

zroubalik commented Jun 5, 2023

eugen-nw commented Jun 5, 2023

JorTurFer commented Jun 5, 2023

eugen-nw commented Jun 5, 2023

zroubalik commented Jun 6, 2023

eugen-nw commented Jun 6, 2023

LewisJackson1 commented Aug 16, 2023

eugen-nw commented Aug 16, 2023 •

edited

Loading

LewisJackson1 commented Aug 16, 2023

JorTurFer commented Aug 16, 2023

LewisJackson1 commented Aug 16, 2023 •

edited

Loading

JorTurFer commented Aug 16, 2023 •

edited

Loading

eugen-nw commented Aug 17, 2023 •

edited

Loading

LewisJackson1 commented Aug 17, 2023

Job-based Service Bus Scaler scales to too many instances #4554

Job-based Service Bus Scaler scales to too many instances #4554

Comments

eugen-nw commented May 19, 2023 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented May 22, 2023

eugen-nw commented May 22, 2023

JorTurFer commented May 24, 2023

eugen-nw commented May 24, 2023 via email

JorTurFer commented May 26, 2023

eugen-nw commented May 30, 2023

JorTurFer commented May 31, 2023 • edited Loading

eugen-nw commented May 31, 2023

JorTurFer commented May 31, 2023

eugen-nw commented Jun 2, 2023

JorTurFer commented Jun 2, 2023

eugen-nw commented Jun 2, 2023 • edited Loading

zroubalik commented Jun 5, 2023

eugen-nw commented Jun 5, 2023

JorTurFer commented Jun 5, 2023

eugen-nw commented Jun 5, 2023

zroubalik commented Jun 6, 2023

eugen-nw commented Jun 6, 2023

LewisJackson1 commented Aug 16, 2023

eugen-nw commented Aug 16, 2023 • edited Loading

LewisJackson1 commented Aug 16, 2023

JorTurFer commented Aug 16, 2023

LewisJackson1 commented Aug 16, 2023 • edited Loading

JorTurFer commented Aug 16, 2023 • edited Loading

eugen-nw commented Aug 17, 2023 • edited Loading

LewisJackson1 commented Aug 17, 2023

eugen-nw commented May 19, 2023 •

edited

Loading

JorTurFer commented May 31, 2023 •

edited

Loading

eugen-nw commented Jun 2, 2023 •

edited

Loading

eugen-nw commented Aug 16, 2023 •

edited

Loading

LewisJackson1 commented Aug 16, 2023 •

edited

Loading

JorTurFer commented Aug 16, 2023 •

edited

Loading

eugen-nw commented Aug 17, 2023 •

edited

Loading