Not able to achieve streaming of Keda scaled jobs #5881

vinayak-shanawad · 2024-06-12T12:23:24Z

Report

We are running the Generative AI workloads (GPU resources) using Keda-scaled jobs. We are not able to achieve streaming of Keda-scaled jobs.

Expected Behavior

Scenarios:

Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1

(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job. So we would expect Keda to process subsequent jobs even if existing jobs are in progress.

FYI: We did achieve streaming of batch jobs on AWS SageMaker where we can create N number of jobs in parallel even if existing jobs are in progress.

Actual Behavior

Scenarios:

Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it. Let’s say the consumer places another message in the queue while 1st job is still running then Keda will not trigger another job until it completes 1st job.

We tried addressing the above concern with the below settings.

Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 5 jobs and processes 1 job but no use of other jobs/pods. It would be expensive in terms of cost because we are launching 4 GPU pods/jobs unnecessarily. After all, there is only one message in a queue.
2 messages in the queue → Keda triggers 5 jobs and processes 2 jobs but no use of other pods.

Steps to Reproduce the Problem

Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 1
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 1 job/pod and processes it.
Keda scaled job settings --> pollingInterval = 30, maxReplicaCount = 10, parallelism = 5
(Assuming the SQS queue is empty before placing the messages in the queue for the below scenarios)
1 message in the queue → Keda triggers 5 jobs and processes 1 job.
2 messages in the queue → Keda triggers 5 pods and processes 2 pods.

Logs from KEDA operator

No response

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

AWS SQS Queue

Anything else?

No response

vinayak-shanawad · 2024-06-18T06:44:19Z

Any updates on this issue?

zroubalik · 2024-06-25T21:24:24Z

Makes sense, are you willing to contribute a fix?

junekhan · 2024-06-28T07:35:38Z

This feature can probably resolve the issue.

vinayak-shanawad · 2024-07-10T10:57:39Z

@junekhan Should we set any specific parameter in scaledjob spec to resolve this issue?

stale · 2024-09-11T01:13:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2024-09-20T07:42:57Z

This issue has been automatically closed due to inactivity.

vinayak-shanawad added the bug Something isn't working label Jun 12, 2024

stale bot added the stale All issues that are marked as stale due to inactivity label Sep 11, 2024

stale bot closed this as completed Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to achieve streaming of Keda scaled jobs #5881

Not able to achieve streaming of Keda scaled jobs #5881

vinayak-shanawad commented Jun 12, 2024 •

edited

Loading

vinayak-shanawad commented Jun 18, 2024

zroubalik commented Jun 25, 2024

junekhan commented Jun 28, 2024

vinayak-shanawad commented Jul 10, 2024

stale bot commented Sep 11, 2024

stale bot commented Sep 20, 2024

Not able to achieve streaming of Keda scaled jobs #5881

Not able to achieve streaming of Keda scaled jobs #5881

Comments

vinayak-shanawad commented Jun 12, 2024 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

vinayak-shanawad commented Jun 18, 2024

zroubalik commented Jun 25, 2024

junekhan commented Jun 28, 2024

vinayak-shanawad commented Jul 10, 2024

stale bot commented Sep 11, 2024

stale bot commented Sep 20, 2024

vinayak-shanawad commented Jun 12, 2024 •

edited

Loading