Orchestartor stuck in Running status even after backend jobs are completed #3078

pavan530 · 2025-04-04T10:05:33Z

pavan530
Apr 4, 2025

Scenario and Current Deployment

We have a service bundled with durable function with servicebus trigger and two activity functions in an orchestrator, packaged it in a DockerFile, and deployed to Azure Container Apps. The functionality of our service is as follows: the Controller service communicates with the PlanGenerator service, which in turn interacts with the Azure OpenAI LLM model and tools via apis.

Explicit Settings in host.json

{ "version": "2.0", "extensions": { "durableTask": { "hubName": "copilotplangeneratorhub", "maxConcurrentActivityFunctions": 50, "maxConcurrentOrchestratorFunctions": 50 }, "serviceBus": { "messageHandlerOptions": { "maxConcurrentCalls": 10, "autoComplete": false }, "prefetchCount": 10 } }, "functionTimeout": "09:59:59", "retry": { "strategy": "fixedDelay", "maxRetryCount": 3, "delayInterval": "00:10:00" // 10 minutes delay after a failure. } }

Function Timeout Behaviour

We have set a function timeout of approximately ~10 hours. Despite this, long-running jobs are not completing even though the backend jobs are finished. The activity function stops, retries, and logs indicate a function timeout with retries. Our backend jobs typically run for 1-2 hours at most.

To reproduce the issue, we decreased the function timeout to 00:19:59 and observed logs indicating a function timeout every ~19 minutes. During this time, if an backend job was running, it would be lost, and a retry would occur after 10 minutes (as per configuration), restarting the activity task. Below is the logs snapshot for reference:

Snapshot from the Taskhub history table for the above highlighted instance

Behaviour for the First Message

Whenever a user sends a request to service after deploying a new revision, even there is a 5-minute wait time. During this period, the message is dropped/late reply, but the orchestrator schedules the task, which does not complete and stuck in running state. Upon restarting the service, we observe that the response for the same message is received. However, if a new message is sent in the new revision, the response is either significantly delayed or remains in a running state indefinitely.

Resource Allocation and Performance

Would like to understand more about how requests are being served in our setup with only one replica having 16 vCPUs and 32 GB of memory. Does every activity function create a process and share compute resources from Azure Container Apps? or single function app serves all the requests and when function app dies due to timeout, all the activity functions would stop? How does this work?

@cgillum Would you be able to help here? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orchestartor stuck in Running status even after backend jobs are completed #3078

{{title}}

Replies: 0 comments

Select a reply

Orchestartor stuck in Running status even after backend jobs are completed #3078

pavan530 Apr 4, 2025

Scenario and Current Deployment

Explicit Settings in host.json

Function Timeout Behaviour

Behaviour for the First Message

Resource Allocation and Performance

Replies: 0 comments

pavan530
Apr 4, 2025